Transcript Document

Metabolomics PCB 5530 Tom Niehaus Fall 2014

Learning Outcomes Day 1

• Lecture - Learn the basics of metabolomics - Understand the limitations of metabolomics - Things to consider when using metabolomics for your own research

Day 2

• Finish lecture • Activity 1: Identifying an unknown peak • Activity 2: Analyzing a metabolomics dataset

Definitions and Background

Metabolome = the total metabolite pool

• All low molecular weight (< 2000 Da) organic molecules in a sample such as a leaf, fruit, seedling, etc.

Sugars Nucleosides Organic acids Ketones Aldehydes Amines Amino acids Small peptides Lipids Steroids Terpenes Alkaloids Drugs (xenobiotics)

Definitions and Background

Metabolomics = high-throughput analysis of metabolites

Metabolomics is the simultaneous measurement of the levels of a large number of cellular metabolites (typically several hundred). Many of these are not identified (i.e. are just peaks in a profile).

Not hypothesis driven snapshot

Definitions and Background

Metabolomics

-measure many compounds

Definitions and Background

Scope Accuracy Metabolic profiling

-measure a set of related compounds

Targeted analysis

-measure a specific compound

Definitions and Background History and Development

• Metabolic profiling is not new. Profiling for clinical detection of human disease using blood and urine samples has been carried out for Centuries.

This urine wheel was published in 1506 by Ullrich Pinder, in his book Epiphanie Medicorum. The wheel describes the possible colors, smells and tastes of urine, and uses them to diagnose disease. Nicholson, J. K. & Lindon, J. C. Nature 455, 1054–1056 (2008).

Definitions and Background History and Development

• Advanced chromatographic separation techniques were developed in the late 1960’s.

• Linus Pauling published “Quantitative Analysis of Urine Vapor and Breath by Gas Liquid Partition Chromatography” in 1971 • Chuck Sweeley at MSU helped pioneer metabolic profiling using gas chromatography/ mass spectrometry (GC-MS) • Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early leaders in the field.

• Metabolomics is expanding to catch up with other multiparallel analytical techniques (transcriptomics, proteomics) but remains far less developed and less accessible.

Definitions and Background Plant Metabolome Size

• It is estimated that all plant species contain 90,000 - 200,000 compounds.

• Each individual plant species contains about 5,000 – 30,000 compounds.

e.g. ~ 5,000 in Arabidopsis The plant metabolome is much larger than that of yeast, where there are far fewer metabolites than genes or proteins (<600 metabolites vs. 6000 genes). The size of the plant metabolome reflects the vast array of plant secondary compounds. This makes metabolic profiling in plants much harder than in other organisms.

Definitions and Background The Power of Metabolomics

Silent Knockout Mutations.

~90% of Arabidopsis knockout mutations are silent – i.e. have no visible phenotype and so provide no clues to gene function. (The search for some sort of visible phenotype therefore often becomes desperate.) The situation in yeast is similar – up to 85% of yeast genes are not needed for survival.

When there is little or no change in growth rate (visible phenotype) of a knockout mutant, the pool sizes of metabolites have altered so as to compensate for the effect of the mutation, leaving metabolic fluxes are unchanged. Thus – intuitively – mutations that are silent when scored for metabolic fluxes or growth rate (growth rate is the sum of all metabolic fluxes) should have obvious effects on metabolite levels. There is a firm theoretical basis for this in MCA.

Definitions and Background The Power of Metabolomics

Example.

• In the Chloroplast 2010 project (phenotype analysis of knockouts of Arabidopsis genes encoding predicted chloroplast proteins): • Various knockouts showed essentially normal growth and color but highly abnormal free amino acid profiles, e.g. At1g50770 (‘Aminotransferase like’)

Definitions and Background Limitations of metabolomics

• High biological variance in metabolite levels (i.e., high variation between genetically identical plants grown in the same conditions) • Unlike nucleic acids and proteins, metabolites have a vast range of chemical structures and properties. Their molecular weights span two orders of magnitude (20–2000 Da). Therefore no single extraction or analysis method works for all metabolites. (Unlike DNA sequencing, microarrays, MS analysis of proteins – all are general methods.) • The concentrations of various metabolites can vary dramatically from mM to pM concentrations.

• Some metabolites are labile and won’t survive extraction and analysis • Issues with chromatography, detection, and data analysis

sample preparation

Metabolomics Steps in metabolomics

sample extraction chromatography detection data analysis

Sample Preparation Growth/Sample Size

• Grow organisms (e.g. plants or bacteria) under identical conditions • Randomize the treatment groups (Make sure the effects you measure are due to the variable being testing) • number of replicates… depends on what you want to find - Large differences = small replication needed - Small differences = large replication needed • In general, six replicates for each treatment are needed (due to high biological variability)

Sample Preparation Sample collection

• Uniform sample sizes (e.g. hole punches in leaves) • Be consistent - similar tissue - time of day • Quickly freeze sample in liquid nitrogen, store samples at -80 ° C • Fast-harvesting method for bacteria (~30 sec)

Sample Extraction Choosing an extraction method

• No universal extraction method exists • Some solvents may degrade certain compounds • Its good to have some idea of what metabolites you want to extract

Sample Extraction Sample extraction

• The method should be consistent and reproducible • Further workup may be required (e.g. solid phase extraction) SPEX SamplePrep Grinder

Chromatography introduction

• Invented in 1900 by Mikhail Tsvet (used to separate plant pigments) • There are several types of chromatography, but all consist of a stationary phase and a mobile phase. Compounds are separated based on differential partitioning between the two phases.

• Types include: - TLC (thin-layer chromatography) - GC (gas chromatography) - LC (liquid chromatography) GC and LC are routinely used in metabolomics

• GC = ‘good chromatography’

Chromatography Gas Chromatography

• optimized over several decades • ~5 columns routinely used (5% diphenyl/95% methyl siloxane) • high reproducibility Identification based on RT Limitations: - high temperatures can destroy labile compounds - polar compounds cannot ‘fly’ on GC columns and must first be derivatized

Step 1) Methoximation

Chromatography Sample derivatization

Step 2) Silylation

Z/E isomer have same mass spectrum but differ 2 seconds in retention time Gas chromatography requires volatile compounds (two step derivatization in vial) 1) Methoximation of aldehyde and keto groups (primarily for opening reducing ring sugars) 2) Silylation of polar hydroxy, thiol, carboxy and amino groups with silylation agent MSTFA • • A single compound with multiple active groups will result in multiple peaks (1TMS, 2TMS, 3TMS) GC-MS can distinguish between stereoisomers Anal Chem. 2009 Dec 15;81(24):10038-48. doi: 10.1021/ac9019522.

FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry.

Kind T, Wohlgemuth G, Lee do Y, Lu Y, Palazoglu M, Shahbaz S, Fiehn O.

20

• LC = ‘Lousy chromatography’

Chromatography Liquid Chromatography

• fairly new, recent advances • thousands of columns available - normal phase - reverse phase -ion exchange -HILIC • infinite solvent systems possible • low reproducibility Advantages: - compound can be collected after separation - derivatization not necessary - a separation protocol can be optimized for nearly any compound

Detection Mass Spectrometry

• mass spectrometry is a technique to measure the mass of ions (m/z) • All mass spectrometers perform three main tasks: 1) Ionize molecules: 2) Use electric and magnetic fields to accelerate ions and manipulate their flight: 3) Detect ions (convert to electronic signal):

Example mass spectrum:

Detection Mass Spectrometry

m/z

Detection Mass Spectrometry

Normalized Intensity 100 75 50 25 0 0

Chromatogram (GC-MS) Peak selector

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00

22.00

Normalized Intensity 100 75

Mass spectrum (EI)

83 61 50 25 35 0 30 40 47 50 60 70 70 80 97 129 166 90 100 112 110 119 120 130 140 150 160 170 Time [min] m/z

Mass Spectrometry Ionization: chemical vs electon

[M+H] + 70eV; 500uA Emission; 40% CI gas; mass range 65-800; ScanRate0.2-0.03; source tempe 200C;PushInter 40 Chemical Ionization (+) [M+28] + [M+40] + Electron Ionization (+) Accurate mass [u] A+3 [%] 397.1690

Mass accuracy [ppm] Isotopic abundance error [%] 5 A+1 [%] 5 37.90

A+2 [%] 17.84

5.03

• •

[M+H] +

is very abundant in chemical ionization (CI) Different ionization gases can be used such as NH 3 , methane, butane Example picture: adduct ions at M+28.02=[M+C2H5] + used for verification of [M+H] + and M+40.04=[M+C3H5] + are

Adduct formation – expect the unexpected

Adduct ion

[M+H]+ [M+2H]2+ [M+H-H2O]+ [M-H] [M+Na]+ [M+H-NH3]+ [M+NH4]+ [M-H-H2O] [M-H+2Na]+ [M-H+H2O] [M+NH4-H2O]+ [M+H+H2O]+ [M+H+Na]2+ [M+H+K]2+ [M-2H]2 [M+2Na]2+ [M+2H-NH3]2+ [M+K]+ [M+H-2H2O]+ [M+3H]3+ [M+2H-H2O]2+ [M]+.

[M+2Na-H]+ [M-H+2K]+ [M+H-CO]+ [M+H-CO2]+ [M+H-CH2O2]+ [M-H-NH3] [M.Cl] [M+Li]+

Percent [%] Adduct ion

62.55381

[M+H-C3H8O]+ 11.44459

[M-H-H2O-CO2] 8.77598

6.25214

5.51055

1.19494

0.73715

0.34604

0.32953

0.24508

0.22984

0.19429

0.18286

0.17524

0.13968

0.13778

0.13714

0.13651

0.11810

0.06667

0.06476

0.05905

0.05143

0.05079

0.04635

0.04318

0.03810

0.03746

0.03556

0.03111

[M-H-H2O-HCO2H] [M+H-3H2O]+ [M+H-CHN]+ [M+K-3H]2 [M+H-(CH3)2NH]+ [M+H-CHNO]+ [M+H-C2H6O]+ [M+H-CH4O]+ [M+H-C7H13NO3]+ [M+Na-2H] [M-H-CH2O] [M+H-C11H12N2O3]+ [M+H-C13H16N3O4]+ [M+H-C17H25N3O4]+ [M+CH3CO2] [M-H2O+Na]+ [M-H+NH3] [M+H-C9H9NO]+ [M+H-C15H21N2O3]+ [M-2H+3Na]+ [M+HCO2] [M+H-NO2]+ [M+H-C6H13NO2]+ [M-H-C3H5NO2] [M(81Br)-H] [M+H-HCO2H]+ [M-2H+Li] [M+H-CH4]+

Percent [%] Adduct ion

0.02667

[M-CCl3]+ 0.02667

[M-H-CO2] 0.02667

0.02540

0.02540

0.01905

0.01524

0.01333

0.01333

0.01270

0.01143

0.00952

0.00952

0.00952

0.00952

0.00952

0.00889

0.00825

0.00762

0.00762

0.00762

0.00698

0.00635

0.00571

0.00571

0.00508

0.00508

0.00508

0.00444

0.00444

[M+H-C5H7PO6]+ [M+H-HCl]+ [M+H-C12H12N2O3]+ [M+H-CH3CO2H]+ [M+H-CH3]+.

[M+H-H2]+ [M+H-C3H8NO6P]+ [M+H-C5H14NO4P]+ [M+Li-(CH3)3N]+ [M+Li-C5H14NO4P]+ [M+Cl] [M(35Cl)-H] [M(37Cl)-H] [M-H-C5H7O6P] [M+H-C3H7O5P]+ [M-H-C6H6N8O] [M(81Br)+H]+ [M-C4H9]+ [M-2H+3Li]+ [M-H-HCl] [M+2Li-H]+ [M+H-C8H10O2]+ [M+H-C2Cl4]+ [M-H-C7H5NO] [M+H-C5H11N]+ [M+Ba-H]+ [M+H-C14H25NO3]+ [M+H-C6H5NO2S]+

Percent [%] Adduct ion

0.00381

[M(37Cl)]+.

0.00381

[M-CH3]+ 0.00381

0.00381

0.00381

0.00381

0.00381

0.00381

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00317

0.00254

0.00254

0.00254

0.00254

0.00254

0.00254

0.00254

0.00254

0.00254

0.00254

[M+H-C4H11N]+ [M+H-NO2-CHO]+ [M-H-HF] [M(37Cl)+H]+ [M-H-C6H10O5] [M+H-H2O-C6H13N]+ [M+H-H2O-H3PO4]+ [M+H-C5H7PO6-NH3]+ [M-H-C5H7PO6] [M+H-H2S]+ [M+H-H2O-C8H8]+ [M+H-H2O-NH3-C8H8]+ [M+H-H2O-NH3-C8H8-CO]+ [M+H-H2O-NH3]+ [M+H-C3H6]+ [M+HCO2-320] [M+H-C3H7N]+ [M-H-H2] [M-H-C16H30O-H2O] [M-H-CH4O] [M+H-C10H8FN3]+ [M+Li-C3H5NO2]+ [M+Li-H3PO4]+ [M-2H+3Li-C15H31CO2H]+ [M-2H+3Na-C3H5NO2]+ [M-2H+Na+Co]+ [M-2H+Li-C3H5NO2] [M-2H+Li-C16H30O]-

Percent [%] Adduct ion

0.00190

[M-2H+Na] 0.00190

[M-H+Co]+ 0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00190

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

[M+H-(CH3)2NH-C3H6]+ [M+H-C10H6(OH)N]+ [M-H+Ni]+ [M-H-H2O-C4H7CO2H] [M+H-OH]+ [M(81Br)+H]+...

[M-H-CH2O-CH2NH] [M+H-CO-CONH]+ [M-H-CONH] [M+H-C3H4O2]+ [M+H-C3H6O4]+ [M+Na-H2S]+ [M-H+2Na-H2S]+ [M-C5H5Cl]+ [M+H-N2]+ [M+H-H2O-CO]+ [M-H-H3PO4] [M+H+CH3CN]+ [M+H-C4H6]+ [M+H-CH3OH]+ [M+H-HCCl3]+ [M+H-C2H3N3]+ [M+H-C3H6O2]+ [M+H-CH2Cl2O]+ [M(356)+H-HCl]+ [M-C4H4O4S]+ [M+H-C8H14O3]+ [M+H-C2H4]+ …around 290 different adducts

Percent [%]

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

0.00127

Statistics: Adducts in NIST12 MS/MS DB (80,000 spectra) Most common adducts for LC-MS ([M+H]+ [M+Na]+ [M+NH4]+ [M+acetate]+) 26

Mass Spectrometry Mass Spectrometers

• There are several types of mass spectrometers: - TOF (time of flight) - Q, QQQ (quadrupole) - Ion Trap - Orbitrap - FTICR (Fourier transform ion cyclotron resonance)

Quad TOF

Mass Spectrometry Definitions and concepts

• isomer- compounds with the same chemical formula e.g. propanol and isopropanol (C 3 H 8 O) C 8 H 10 N 2 O has 100,082,479 isomers • isobar- compounds with similar masses e.g. CO (27.9949) and C 2 H 4 (28.0313) • isotopes- compounds with different numbers of neutrons in their nuclei e.g. 12 C vs 13 C

Mass Spectrometry Definitions and concepts

• Resolution (resolving power) RP(FWHM) = measured mass / peak width at 50% peak intensity • Accuracy Difference in true mass and measured mass • Mass range Range of ions that can be detected (typically 50-1000 m/z)

Mass Spectrometry Why is resolution important?

• High resolution is needed to determine the accurate mass • High resolution is also needed to determine accurate isotopic patterns • Note: -monoisotopic vs ave mass -accurate mass can distinguish isobars, not isomers

Mass Spectrometry Definitions and concepts

• Dynamic range- the concentration range over which a linear response is obtained Determines the capability of an instrument to do quantitative analysis • Sensitivity- the lowest amount an instrument can detect • matrix effects- signal is muted due to complex sample or other unknown processes • Speed- the number of spectra or scans that can be acquired in one second 1 scan/ sec = very slow 500 scans/sec = very fast

Mass Spectrometry Why is high speed important?

In order to deconvolute (separate/clean) overlapping peaks, enough mass spectra have to be acquired to perform the mathematical calculations. With only one spectrum per second this is impossible. That requires: a) fast scanning detectors like time-of-flight (TOF) b) fast data acquisition hardware/software (DAC/ADC) The LECO TOF can acquire up to 500 mass spectra per second. For GC-MS 20 spectra/second sufficient for comprehensive GC (GCxGC) up to 200 spectra/sec needed 32 Source: LECO ChromaTOF Helpfile

Mass Spectrometry Properties of various mass spectrometers

Resolving Power Dynamic Range Sensitivity Speed Cost Maintenance TOF

very good very good excellent excellent 150-300K ave

Quad

fair excellent excellent good 100K ave

Ion Trap

fair fair excellent excellent 100K ave

Orbitrap

very good fair excellent good 500K ave

FT-ICR

excellent fair excellent fair 1M very high

Data Analysis Goals

• Huge data files • Identify all peaks In practice this is very difficult if not impossible • quantification or semi-quantification of compounds Often involves comparing -fold changes in samples or groups of samples e.g. wild-type vs knockout plant Various statistical tests to look for differences in the treatment groups e.g. PCA, MCA, ANOVA

Data Analysis Identifying peaks

• MS libraries can identify peaks (mostly GC/MS), especially when combined with RT information (GC/MS only): e.g. NIST library

Data Analysis Activity 1: Identifying peaks

• Can you find sucrose in a MS dataset?

Example: sucrose (C 12 H 22 O 11 )

Data Analysis Activity 1: Identifying peaks

• Accurate mass can help determine the chemical formula: Example: sucrose (C 12 H 22 O 11 ) -Determine monoisotopic mass at http://www.chemspider.com/ (342.116211 Da) -Determine M+H from MS adduct excel sheet (class website) (343.123487 Da) Lets say you find that mass in the dataset, but is it really sucrose?

-Download Molecular weight calculator at http://www.alchemistmatt.com/mwtwin.html

-Open formula finder under tools -enter molecular weight target: 342.116211

-how many isobars are at 2 ppm? 0.1 ppm -enter 342.116211 at chemspider, how many isomers?

Data Analysis Example output of a metabolomics experiment

• Open GC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -Pathway analysis at http://www.metaboanalyst.ca/MetaboAnalyst/ -enter compound names or KEGG IDs for significant -fold changes -choose organism ‘E. coli’ and submit - Which pathways are affected in this dataset?

• Open HILIC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -How many unidentified peaks?

-Can you identify an unknown peak with a significant fold change