Transcript Metabolomics
EMBO Practical Course on Metabolomics Bioinformatics for Life Scientists
“Dissecting an untargeted metabolomic workflow” Oscar Yanes, PhD
Untargeted metabolomics workflow
Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis Metabolite identification Experimental validation Hypothesis
Untargeted metabolomics workflow
Sample preparation Experimental design Sample analysis by MS and NMR
EMBO Course
Pre-processing data analysis Metabolite identification Experimental validation Hypothesis
Ultimate goal of metabolomics
List of metabolites differentially regulated
Biomarker discovery
Disease vs. control Pathway analysis Model construction Scientific literature
Validation Mechanism Hypothesis
Untargeted metabolomics workflow
Sample preparation Experimental design Sample analysis by MS and NMR Pre-processing data analysis Metabolite identification Experimental validation Hypothesis
THE IMPORTANCE OF EXPERIMENTAL DESIGN
I want to do metabolomics ME COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
… I want to do metabolomics ME COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
I have many samples at -80 ° C. Could you do metabolomics and find out something?
ME COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
!!
I have many samples at -80 ° C. Could you do metabolomics and find out something?
ME COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
BASIC DIAGRAM OF A MASS SPECTROMETER
BASIC DIAGRAM OF A MASS SPECTROMETER
Gas-phase: Gas chromatography Liquid-phase: Liquid chromatography Capillary electrophoresis Solid-phase: Surface-based
BASIC DIAGRAM OF A MASS SPECTROMETER Electron ionization (EI) Chemical ionization (CI) Atmospheric pressure chemical ionization (APCI) Electrospray ionization (ESI) Laser desorption ionization (LDI)
Watch out serum/plasma samples from biobanks!
Glucose 0.4
0.3
0.2
0.1
0.0
0 4 12 Time (h) Pyruvic Acid 0.2
0.1
0.0
0 4 12 Time (h) 24 24 Lactate 1.0
0.8
0.6
0.4
0.2
0.0
0 4 12 Time (h) Choline 1.0
0.8
0.6
0.4
0.2
0.0
0 4 12 Time (h) 24 24
Untargeted metabolomics workflow
Sample preparation
Experimental design
Sample analysis by MS
Pre-processing data analysis Metabolite identification Experimental validation Hypothesis
Requisite for untargeted metabolomics
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Requisite for untargeted metabolomics
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Number of features Intensity of the features
Requisite for untargeted metabolomics
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Number of features Intensity of the features Coverage of the metabolome Accurate quantification and identification of metabolites
How
do we increase the number of features and their intensity??
intensity mass time
Feature: molecular entity with a unique
m/z
and retention time value
How
do we increase the number of features and their intensity??
intensity mass
Sample preparation: - Extraction method Chromatography: - Stationary-phase - Mobile-phase Ion Funnel Technology etc.
time
Extraction method
Hot EtOH/Amm. Acetate Cold Acetone/MeOH
Only 45% of the metabolites are detected with Acetone/MeOH
MS/MS threshold
Extraction method
Yanes O., et al. Anal. Chem. 2011; 83(6):2152-61
Liquid Chromatography: mobile-phase
Ammonium Fluoride Ammonium acetate Formic acid Yanes O et al. Anal. Chem. 2011; 83(6):2152-61
Ammonium fluoride Ammonium acetate F Ammonium fluoride
Chromatography: stationary phase
HILIC RP C18/C8 Effect of pH; ammonium salts; ion pairs (e.g. TBA) LC flow rate and pressure: UPLC vs. HPLC vs. nanoLC (vs. GC!) HPLC UPLC
BASIC DIAGRAM OF A MASS SPECTROMETER Electron ionization (EI) Chemical ionization (CI) Atmospheric pressure chemical ionization (APCI) Electrospray ionization (ESI) Laser desorption ionization (LDI)
PRACTICAL ASPECTS 1.
Number of scans/second
Implications in LC/MS and GC/MS: Quantification Maximum intensity or integrated area
2.
Instrument resolution
Implications: Detector saturation Quantification
3.
Sample amount injected
Implications: Detector saturation
Untargeted metabolomics workflow
Sample preparation Experimental design Sample analysis by MS and NMR
EMBO Course Pre-processing data analysis Metabolite identification
Experimental validation Hypothesis
RAW METABOLOMICS DATA
FROM RAW DATA TO METABOLITE IDs METABOLITE IDENTIFICATIONS STATISTICAL ANALYSIS PRE-PROCESSING RAW DATA CONVERSION
FROM RAW DATA TO METABOLITES IDs
GC/MS
RAW DATA CONVERSION METABOLITE IDENTIFICATIONS PRE PROCESSING
LC/MS
STATISTICAL ANALYSIS
LC/MS GC/MS
PATHWAY ANALYSIS
LC-MS WORKFLOW
LC-MS RAW DATA PROTEOWIZARD mZDATA PREPROCESSING
M1 M2 ...
mZRT1 I mZRT1 M1 ...
...
mZRT2 ...
I mZRT2 M2 ...
mZRT3 ...
...
...
mZRT Features Table
Feature: individual ions with a unique mass-to charge ratio and a unique retention time
STATISTICAL ANALYSIS IDENTIFICATION
LC-MS WORKFLOW RAW LC-MS DATA TO mZXML: PROTEOWIZARD
[Nature Biotechnology, 30 (918–920) (2012)]
VENDOR Agilent Bruker Thermo Fisher Waters AB Sciex FORMATS MassHunter.d
Compass.d, YEP, BAF, FID RAW MassLynx.raw
WIFF CONVERTER ProteoWizard ProteoWizard ProteoWizard ProteoWizard ProteoWizard
LC-MS WORK-FLOW XCMS PRE-PROCESSING
• http://metlin.scripps.edu/download/ •Free & Open Source •Based on R •On-line version •Suitable for: -GC-MS -LC-MS
Analytical Chemistry, 78(3), 779–787, 2006 Analytical Chemistry, 84(11), 5035-5039, 2012
LC-MS WORKFLOW XCMS PRE-PROCESSING 1. FEATURE DETECTION
[BMC Bioinformatics, 2008 9:504]
1. Dense regions in m/z space 2. Gaussian peak shape in chromatogram LC-MS WORKFLOW XCMS PRE-PROCESSING 1. FEATURE DETECTION
LC-MS WORK-FLOW XCMS PRE-PROCESSING 2. RETENTION TIME CORRECTION
LC-MS WORKFLOW
• 10 3 -10 4 • mZRT features features redundancy: IDENTIFICATION NOT FEASIBLE!
-adducts: [M+H + ], [M+Na + ], [M+NH 4 + ], [M+H + -H 2 O]… -isotopes: [M+1], [M+2], [M+3] • Many mZRT features are noisy in nature and irrelevant to our phenomea
STATISTICAL ANALYSIS
FEATURES RANKING Those features varying according to our phenomena are retained to further identification experiments
LC-MS WORK-FLOW FEATURES RANKING CRITERIA (I) ANALYTICAL VARIABILITY -RANDOMIZE -USE QCs TO CHECK ANALYTICAL VARIATION WORKLIST
LC-MS WORK-FLOW FEATURES RANKING CRITERIA (I) ANALYTICAL VARIABILITY
T CV mZRT
(
j
)
S T mZRT
(
j
)
T X mZRT
(
j
) 100
QC CV mZRT
(
j
)
QC S mZRT
(
j
)
QC X mZRT
(
j
) 100
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831
LC-MS WORK-FLOW FEATURES RANKING CRITERIA (IV) HYPOTHESIS TESTING+FDR
=0.05 (235 features significantly varied by chance, 26% out of 900) FDR =0.0074 (20 features varied by chance, 5% out of 404)
#features=4704
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS RETINAS Hypoxia (N=12) vs Normoxia (N=13) #mZRT=7654 NEURONAL CELL CULTURES KO (N=15) vs WT (N=11) #mZRT=6831
10M data points LC-MS WORKFLOW # mZRT=51908 (i) analytical variability # mZRT=38377 (ii) features intensity # mZRT=4704 (iii) hypothesis testing + fold change # mZRT=250 Annotation Data Base look-up Identification experiments 10-50 differential metabolites
Workflow for Metabolite Identification
Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards
Step 2: Search databases for accurate mass
Step 2: Search databases for accurate mass
Each feature returns many hits.
HMDB Metlin
Step 2: Search databases for accurate mass
Common adducts Na + , NH4 + , K + , Cl , and H 2 O loss
Adducts increase number of hits returned!
Workflow for Metabolite Identification
Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards
Step 3: Filter “putative” identification list
Eliminate
•drugs? • intensity in the mass spectrum • adducts? • matches with obviously inconsistent retention times Example: feature with m/z 733.56 is unlikely to be a phospholipid if it has a 1-min RT with reverse-phase chromatography.
Look for hits that implicate the same pathway, give those features priority.
Standards can be expensive, your intuition will save you money and time!
Workflow for Metabolite Identification
Step 1: Select interesting features Step 2: Search databases for accurate mass Step 3: Filter “putative” identification list Step 4: Compare RT and MS/MS of standards
What experimental data should be required to constitute a metabolite identification?
• Accurate mass?
• Retention time?
Unlike proteomics, no journals have requirements or guidelines for publication of metabolite identifications.
• MS/MS data?
accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identifiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.”
Accurate mass identifications are putative
All structures have a neutral mass of 146.0691
Mass error (even if small) and adducts add more possibilities!
accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.”
Many structural isomers have the retention time
citrate isocitrate Citrate and isocitrate have the same retention time but different MS/MS patterns.
accurate mass “The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.” accurate mass and retention time “…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.” accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass, retention time, and MS/MS “Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
Step 4: Compare RT and MS/MS of standards Q-TOF
Standard7α-hydroxy-cholesterol
HO H H H OH H H
Biological sample
367.33
367.33
60 100 140 180 220 260 Mass-to-Charge (m/z) 300 340 380 420
Step 4: Compare RT and MS/MS of standards Retention time will be available from the profiling experiment, however, to obtain MS/MS data for the feature of interest in the research sample typically another experiment is required.
Note: Only need to perform MS/MS on one research sample. Pick a sample from the group for which the feature is up regulated!
Do not pick this group
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available) FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches) NMR can provide structural details When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight (MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
• Thermophile organism adapted to live at high temperatures.
• Organisms challenged with cold temperature (72 º C) and compared to high-temperature (95 º C) controls.
Feature up-regulated at cold temperature
Natural product * N 1 -Acetylthermospermine
Identification???
*
Feature up-regulated at cold temperature
Natural product * N 1 -Acetylthermospermine *
Intensity of m/z 112 fragment is significantly different. NOT A MATCH!
Chemical synthesis of hypothesized structure is required
Synthesized metabolite produces comparable MS/MS data as natural product from Pyrococcusfuriosus.
Natural product N 4 (N Acetylaminopropyl)spermidine N 1 -Acetylthermospermine
Ultimate goal of metabolomics
List of metabolites differentially regulated
Biomarker discovery
Disease vs. control Pathway analysis Model construction Scientific literature
Validation Mechanism Hypothesis
Validate your metabolites!!
Targeted metabolomics Molecular biology techniques LC and GC-Triple quadrupole MS Immunohistochemistry Reverse Transcription-PCR Gene expression array Cell cultures Animal experimentation …..
Thank you
email: [email protected]
web: www.yaneslab.com
Twitter: @yaneslab