Public Thesis defense - LIDAM

SST

17 janvier 2020

16h

Louvain-la-Neuve

Voie du Roman Pays, 20 - Salle C115

Uncovering informative content in metabolomics: from pre-processing of 1H NMR spectra to biomarker discovery in multifactorial designs by Manon MARTIN

Pour l’obtention du grade de Docteur en sciences

Metabolomics is a quite young and multidisciplinary science at the interface between life sciences and statistics/bioinformatics; and NMR data, one of its pillar. The objectives of this thesis are threefold: (1) go beyond the use of black box commercial software and uncritically applied mainstream chemometrics methods by providing a clear assessment and shed new light on the existing tools, (2) suggest innovative methodologies and processing workflows for topics of current interest in metabolomics, and (3) offer open source resources to researchers interested in this field. First, a new R package, called PepsNMR, has been implemented. It is the only R package dedicated to a comprehensive 1H NMR pre-processing strategy to obtain interpretable spectral profiles from FID raw data. It involves common but also innovative non- or semi-parametric methods to deal with instrumental artefacts and other sources of variability of no interest. After a series of improvements and validation steps, the workflow is successfully compared to a conventional pre-processing based on spectral repeatability measurements. PepsNMR has been published in Bioconductor and is integrated to a web platform dedicated to metabolomics data treatment (Workflow4Metabolomics). In a second part, the classic and sparse (O)PLS algorithms for biomarker discovery are presented and explained in a unified fashion. Given the current lack in the literature, they are further compared to each other for feature selection in an innovative evaluation scheme that involves the resampling of a dataset where the exact location of the true biomarkers is known in advance, and under different and realistic conditions (signal to noise levels and sample sizes). The last two parts of this thesis emphasize the use of ASCA in metabolomics and open vast perspectives of applications. First, ASCA and other global methods of interest (AComDim, PARAFASCA and AMOPLS) are presented under a common framework, enhanced on some aspects and extended to general linear models. They are further compared to each other extensively based on an experimental dataset. Finally, ASCA is successfully extended to linear mixed models to deal with more advanced designs involving random effects. New suggestions for each step of the applied methodology are made to appropriately compare fixed and random sources of variability and test their significance. Except for 1H NMR pre-processing, the suggested methodological developments can be more generally applied to other datasets sharing similar characteristics (multicollinearity, low (high) number of samples (variables)). All R codes and datasets are provided online, some of them are also included in R packages.

Jury members :

  • Prof. Bernadette Govaerts (UCLouvain), supervisor
  • Prof. Catherine Legrand (UCLouvain), chairperson
  • Prof. Michel Verleyen (UCLouvain), secretary
  • Prof. Pascal De Tullio (Center for Interdisciplinary Research on Medicines, ULiège, Belgium)
  • Prof. Paul H.C. Eilers (Erasmus University Medical Center, The Netherlands)
  • Prof. Rainer von Sachs (UCLouvain)

Télécharger l'annonce