Transcript Document
Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module 6 David Wishart A Typical Metabolomics Experiment 2 Routes to Metabolomics ppm 7 6 5 4 Quantitative (Targeted) Methods 3 2 Chemometric (Profiling) Methods 25 TMAO hippurate allantoin creatinine taurine 1 PC2 20 creatinine 15 10 citrate ANIT 5 hippurate urea 2-oxoglutarate water succinate fumarate 0 -5 -10 ppm 7 6 5 4 3 2 1 Control -15 PAP -20 -25 -30 PC1 -20 -10 0 10 Metabolomics Data Workflow Chemometric Methods Targeted Methods • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/download/ Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8... Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances Same or different? Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Pareto scaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90. Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity Module 6 bioinformatics. MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca GC/LC-MS raw spectra MS / NMR peak lists MS / NMR spectra bins • Peak detection • Retention time correction Metabolite concentrations Baseline filtering Peak alignment • Data integrity check • Missing value imputation Resources & utilities • Peak searching • Pathway mapping • Name conversion • Lipidomics • Metabolite set libraries Enrichment analysis • Over representation analysis • Single sample profiling • Quantitative enrichment analysis Data normalization • Row-wise normalization (4) • Column-wise normalization (4) Statistical analysis Pathway analysis • Enrichment analysis • Topology analysis • Interactive visualization • Univariate analysis • Dimension reduction • Feature selection • Cluster analysis • Classification Downloads •Processed data • PDF report • Images Time-series /two factor • Visualization • Two-way ANOVA • ASCA • Temporal comparison MetaboAnalyst Overview • Raw data processing – Using MetaboAnalyst • Data Reduction & Statistical analysis – Using Metaboanalyst • Functional enrichment analysis – Using MSEA in MetaboAnalyst • Metabolic pathway analysis – Using MetPA in MetaboAnalyst Module 6 bioinformatics. Example Datasets Example Datasets Module 6 bioinformatics. Metabolomic Data Processing Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats – – – – Module 6 Concentration tables (Targeted Analysis) Peak lists (Untargeted) Spectral bins (Untargeted) Raw spectra (Untargeted) bioinformatics. Data Upload Module 6 bioinformatics. Alternatively … Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows Module 6 bioinformatics. Data Integrity Check Module 6 bioinformatics. Data Normalization Module 6 bioinformatics. Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, columnwise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects) Module 6 bioinformatics. Data Normalization • Column-wise normalization aims to make each variable (column) comparable to each other. This procedure is useful when variables are of very different orders of magnitude. Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling Module 6 bioinformatics. Normalization Result Module 6 bioinformatics. Quality Control • Dealing with outliers – Detected mainly by visual inspection – May be corrected by normalization – May be excluded • Noise reduction – More of a concern for spectral bins/ peak lists – Usually improves downstream results Module 6 bioinformatics. Visual Inspection • What does an outlier look like? Module 6 bioinformatics. Outlier Removal Module 6 bioinformatics. Noise Reduction Noise Reduction (cont.) • Characteristics of noise – Low intensities – Low variances (default) Module 6 bioinformatics. Data Reduction and Statistical Analysis Common tasks • • • • To detect interesting patterns; To identify important features; To assess difference between the phenotypes Classification / prediction Module 6 bioinformatics. ANOVA View Individual Compounds Module 6 bioinformatics. Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other? Module 6 bioinformatics. Template Matching • Looking for compounds showing interesting patterns of change Module 6 bioinformatics. Template Matching (cont.) Question • Q: Identify compounds that decrease in the first three groups but increase in the last group? Module 6 bioinformatics. PCA Scores Plot PCA Loading Plot Question Q: Identify compounds that contribute most to the separation between group 15 and 45 Module 6 bioinformatics. PLS-DA Score Plot Determine # of Components Important Compounds Model Validation Questions • Q: What does p < 0.01 mean? • Q: How many permutations need to be performed if you want to claim p value < 0.0001? Module 6 bioinformatics. Heatmap Visualization Heatmap Visualization (cont.) Module 6 bioinformatics. Question Q: Identify compounds with a low concentration in group 0, 15 but increase in the group 35 and 45 Q: Which compound is the only one significantly increased in group 45? Module 6 bioinformatics. Download Results Analysis Report Module 6 bioinformatics. Metabolite Set Enrichment Analysis Metabolite Set Enrichment Analysis (MSEA) • Web tool designed to handle lists of metabolites (with or without concentration data) • Modeled after Gene Set Enrichment Analysis (GSEA) • Supports over representation analysis (ORA), single sample profiling (SSP) and quantitative enrichment analysis (QEA) • Contains a library of 6300 predefined metabolite sets including 85 pathway sets & 850 disease sets http://www.msea.ca Enrichment Analysis • Purpose: To test if there are some biologically meaningful groups of metabolites that are significantly enriched in your data • Biological meaningful – Pathways – Disease – Localization • Currently, only supports human metabolomic data Module 6 bioinformatics. MSEA • Accepts 3 kinds of input files • 1) list of metabolite names only • 2) list of metabolite names + concentration data from a single sample • 3) a concentration table with a list of metabolite names + concentrations for multiple samples/patients Module 6 bioinformatics. Start with a Compound List Upload Compound List Compound Name Standardization Name Standardization (cont.) Select a Metabolite Set Library Result Result (cont.) The Matched Metabolite Set Single Sample Profiling Single Sample Profiling (cont.) Concentration Comparison Concentration Comparison (cont.) Quantitative Enrichment Analysis Data Set Selected • Here we are using a collection of metabolites identified by NMR (compound list + concentrations) from the urine from 77 lung and colon cancer patients, some of whom were suffering from cachexia (muscle wasting) Module 6 bioinformatics. Result The Matched Metabolite Set Module 6 bioinformatics. Question • Q: Are these metabolites increased or decreased in the cachexia group? Module 6 bioinformatics. Metabolic Pathway Analysis with MetPA Pathway Analysis • Purpose: to extend and enhance metabolite set enrichment analysis for pathways by – Considering the pathway structures – Supporting pathway visualization • Currently supports 15 organisms Module 6 bioinformatics. Data Upload Module 6 bioinformatics. Data Set Selected • Here we are using a collection of metabolites identified by NMR (compound list + concentrations) from the urine from 77 lung and colon cancer patients, some of whom were suffering from cachexia (muscle wasting) Module 6 bioinformatics. Normalization Module 6 bioinformatics. Pathway Libraries Network Topology Analysis Which Node is More Important? High degree centrality High betweenness centrality Module 6 bioinformatics. Pathway Visualization Module 6 bioinformatics. Pathway Visualization (cont.) Module 6 bioinformatics. Question • Q: Which pathway do you think is likely to be affected the most? Why? Module 6 bioinformatics. Result Module 6 bioinformatics. Not Everything Was Covered • • • • • • Clustering (K-means, SOM) Classification (SVM, randomForests) Time-series data analysis Two factor data analysis Peak searching …. Module 6 bioinformatics.