Cheminformatics and mass spectrometry course

Download Report

Transcript Cheminformatics and mass spectrometry course

Welcome!
Mass Spectrometry meets ChemInformatics
Tobias Kind and Julie Leary
UC Davis
Course 3: Mass spectral and molecular
database search
Class website: CHE 241 - Spring 2008 - CRN 16583
Slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/
PPT is hyperlinked – please change to Slide Show Mode
1
Molecules and mass spectra
Close relationship between molecular structure and mass spectra
Molecular structure is reflected in mass spectral features
(peaks, peak heights and peak combinations)
Mass spectra reflect a state of gas phase ion physics and chemistry
(rearrangements, fragmentations, bond cleavages)
Electron impact (70 eV) mass spectra; Source: NIST05
2
Molecules and mass spectra
Similar structures may or may have not similar mass spectra
130
100
Si
N
73
Si
50
0
47
59
59
91
65
91
147
105 114
102
163
179 188
163
132
147
204
206
O
294
220
280
179
Si
N
50
100
O
44
Si
73
116
40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320
Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]N-Methylphenylethanolamine, bis(trimethylsilyl)-
Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program
3
Molecules and mass spectra
Similar mass spectra may or may have not similar structures
43
100
55
70
83
97
50
29
111
27
125
65
15
0
32
140
139
154
153
168
196
168
196
125
27
111
29
50
97
41
100
10
20
30
40
69
55
50
60
1-Tetradecene
70
83
80
90
100
110
120
130
140
150 160 170 180
Cyclotetradecane
190
200
210
Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program
4
Mass spectral databases I
Name
NIST05
Wiley 8
Palisade 600K
Spectra count
200,000
400,000
600,000
Type
electron impact spectra (EI 70 eV)
electron impact spectra (EI 70 eV)
electron impact spectra (EI 70 eV)
NIST MS/MS
MassFrontier
5,200
7,000
MS/MS (ESI, +/-, 30-100V CID)
MSn, ESI, (Spectral Tree Library )
Important is data quality
Annotation with CAS and Structure and Formula
Link to literature or publication useful
Currently no large ESI,APPI,APCI libraries available (free or commercial)
5
Mass spectral databases II
Smaller specialized libraries
Pfleger Maurer Weber (Drugs) MS+RI, 70eV
MassFinder (Volatiles) MS+RI, 70eV
RIZA DB (Toxicants) MS+RI, 70eV
Golm DB (primary Metabolites) MS+RI, 70eV
Fiehnlib (primary Metabolites) MS+RI, 70eV
MassBank (Metabolites) ESI, MSn , accurate masses
AAFS (Drugs, Forensic,Toxicology), MS+RI, 70eV
ChemicalSoft (Drugs), MS/MS, MSE
_____________________________________________________________
272
100
Cl
Cl
Cl
Cl
Cl
50
Cl
237
Cl
Cl
Cl
332
Cl
Cl
Cl
404
0
230 250 270 290 310 330 350 370 390 410 430
(riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex|
450
In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1)
and temperature program must be used for matching retention indices
In case of ESI, APPI spectra (LC-MS) same mass spectrometer design
and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy
6
Mass spectral search algorithms
PBM - Probability Based Matching (McLafferty & Stauffer) – since 1976
Dot Product (Finnigan/INCOS) – since 1978
Weighted Dot Product (Stein) – since 1993
Mass Spectral Tree Search (Mistrik) – since 21st century
Weighted Dot Product:
Source: Stein S.E. see notes
Au and Ar: are the abundances of peaks in the user and
reference mass spectra
m: m/z values
w: weighting term
7
NISTMS mass spectral search
The NIST MS Search program is the “gold standard” for EI spectral search
Used for all types of unit resolution spectra MS/MS, APCI, ESI-MS spectra
8
NIST MS Search program 2.0
Search everything:
A) Library Search: Reverse, Normal, Similarity, Neutral Loss
B) Structure Similarity Search: find molecules similar to
C) Formula Search: find C11H13N3O3S
D) Constrained peak search: find peaks with m/z 122 and 188 and 266
E) Name search: find Stuntman (maleic hydrazide)
Search Connections:
Import/Export molecular structures: (msp, hpj, sdf)
Interpret Structures (MSInterpreter.exe)
Find substructures (expert algorithm)
Import spectra from other programs (AMDIS, Chemstation, ChromaTOF)
[Download] – freely available (NIST05 MS Library is licensed ~ $1200)
9
Mass Spectral Trees in Mass Frontier
MassFrontier searches MSn and CID mass spectra
10
Source: MassFrontier Helpfile
Mass Frontier MS search
MS Tree
Hitlits
11
Mass spectral search
Library search is always the first step during the identification process.
Usually library search is not enough to assign unique isomer structures.
Mass spectra must be clean and background free before search.
For LC-MS and GC-MS this requires peak picking and deconvolution.
Additional orthogonal information has to be used:
•
•
•
•
•
restriction of compound space to certain species or material
use of isotope pattern information
use of retention index if derived from GC-MS data
use of retention – logp or logD correlations in case of LC-MS
additional fragmentation at different voltages (MSE)
Only certain mass spectra can be in-silico predicted (calculated)
(peptides, lipids, carbohydrates) – this is not the rule for other molecules
12
MALDI MS based proteomics
Clinical Science
www.clinsci.org
Clin. Sci. (2005) 108, 369-383
13
LC-MS based proteomics approach
14
Source: Paul Rudnick / NIST
Proteomics data analysis (pipeline)
General approaches A) database search (Sequest, Mascot, OMSSA)
B) de-novo sequencing (Peaks, Lutefisk, Pepnovo)
C) hybrid methods (GutenTag, Popitam, Inspect)
15
Picture Source: Paul Rudnick / NIST
OMSSA- Open mass spectrometry search algorithm
• submit spectra to MS/MS search
• in-silico digestion of proteins
• matching of experimental vs. calculated MSn
• hit score computation
• inspection and review of results
Download OMSSA
16
Source: OMSSA (NCBI)
Mass spectral search of peptides (new)
17
See also ProMEX (MPIMP Golm)
Source: Paul Rudnick / NIST
Conversion of mass spectral libraries
Usually a hassle. Keep a copy of libraries always in non-proprietary format.
Request export functions or converters from your mass spec producer.
XCalibur LibraryManager.exe
NIST LIB2NIST.exe [LINK]
Thermo Electron Fisher Finnigan MAT
ICIS/GCQ/ITS 40 (*.lib, *.lbr)
AutoMass (*.spr, *.prs, *.nam, *.hdr, *.fsf, *.cfs)
MassLab (*.idb) to NIST and vice versa
Spectral files *.msd, *.hpj, *.sdf
HP LIB (*.LIB), NIST LIB,
JCAMP-DX, (*.jdx *.hpj)
18
How to search molecules
Exact search
Substructure search
Similarity search
N
Ligand search
N
L
[O,Cl]
19
R-group/Markush search
NIST MS DB has structure similarity search
Good for comparing mass spectra of similar compounds (may have similar mass spectra)
20
Searching Molecules on PubChem
18 million compound DB (++)
Goto PubChem Structure Search
21
CAS SciFinder
• 33 million molecules and 60 million peptides/proteins
• largest reaction DB (14 million reactions) and literature DB
• substructure and similarity search of structures
• a must for chemists and biochemists/biologists
• no bulk download, no good Import/ Export, no Link outs
Download Scifinder
22
Structure search in SciFinder
Retrieved 4000 papers
(refine search only MS and MALDI)
23
How scientist publish mass spectra (*)
Today:
PD
F
A
Scientist A
Runs MS
B
Publication on paper
as bitmap graphic
OCR
DB Curation
DB Creation
Sell DB
Scientist B
Needs DB
Better:
A
DB
B
Central and Open Repository
Electronic Publishing in XML
Computerized Free or Paid Curation
OCR – optical character recognition
DB – database
(*) – and structures and other spectral data
24
Open data repository for mass spectra
Submit spectra before publication (ticket system)
No loss of information (high resolution spectra)
No truncated data (report five peaks only)
No hamburger to cow algorithm needed (OCR)
Fast and instant use with no restrictions
New synergism for data interpretation
Can still cost money (curation)
Works in genomic sciences (GenBank)
Commercial use may be possible
… checkout the BlueObelisk
DB
Central and Open
Repository
25
The Last Page - What is important to remember
There are different search types for mass spectral data
 similarity search, reverse search, neutral loss search, MS/MS search
There are large libraries for electron impact spectra (EI) from GC-MS
 There are no large open/commercial libraries for spectra from LC-MS
For creation of mass spectral libraries a holistic approach is important
 Mass spectral trees can give further information (MSE or MSn)
There are different types of searching structures
 Exact search, similarity search, substructure search
Before you start a research project, create target lists of possible candidates
 Collect mass spectra or structures in libraries with references
26
Reading list (20 min)
Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS
The critical evaluation of a comprehensive mass spectral library
Development and validation of a spectral library searching method for peptide identification from MS/MS
Additional reading list for very diligent and interested pupils (30 min) (*)
An MS/MS Library on an Ion-Trap Instrument for Efficient Dereplication of Natural Products.
Different Fragmentation Patterns for [M + H]+ and [M + Na]+ Ions
The History of the NIST/EPA/NIH Mass Spectral Database
(WO2006040622) DETERMINATION OF MOLECULAR STRUCTURES
USING TANDEM MASS SPECTROMETRY [Link] [PDF]
(*) Edison: “Two per cent is genius and 98 per cent is hard work”
“Bah. Genius is not inspired. Inspiration is perspiration” [SOURCE]
27
Tasks (7 min):
Should be solved and may be graded
1) Goto PubChem [LINK] or Chemspider [LINK] and perform the 3 different
structure searches using benzene; report on the number of results
(use the sketch function to draw benzene (6 ring with 3 aromatic bonds))
2) Download NIST MS Search [LINK] and perform the 3 different
mass spectral searches on cocaine
(download JAMP-DX from NIST [link])
3) Use Instant-JChem [LINK] from last course session and create a local demo
database with PubChem data.
Perform 3 different structure searches with benzene by double-clicking
on the structure search field. Report number of results.
Additional task for proteomics candidates:
4) Download the NIST peptide search [LINK] and perform a search on the given examples
28
Link List
http://www.google.com/search?hl=en&q=rearrangements%2C+fragmentations%2C+bond+cleavage&btnG=Search
High-resolution mass spectral database http://www.massbank.jp/
http://www.google.com/search?hl=en&q=mistrik+highchem&btnG=Search
http://www.google.com/search?hl=en&q=stein+se+peptide+search&btnG=Search
http://fields.scripps.edu/sequest/
http://books.google.com/books?lr=&as_brr=0&q=EDISON+Genius++
+inspiration+++perspiration+++date%3A1800-1898&btnG=Search+Books
http://allured.stores.yahoo.net/idofesoilbyg.html (fragrances, terpenoid mass spectra SE-52 column + RIs)
http://kanaya.naist.jp/DrDMASS/DrDMASSInstruction.pdf
http://www.google.com/search?q=mass+spectral+libraries+NIST05&hl=en&start=10&sa=N
http://books.google.com/books?id=7IUVi06u0TQC&pg=PA114&lpg=PA114&dq=cid+mass+spectra
http://www.google.com/search?hl=en&q=cid+mass+spectra+library+pbm+dot+product&btnG=Google+Search
http://www.google.com/search?hl=en&q=%22similarity+search%22+Substructure+search%22+%22exact+search%22&btnG=Search
http://mmass.biographics.cz/
http://pubchem.ncbi.nlm.nih.gov/omssa/browser_help.htm#RunOMSSASearchLocalDialog
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1906842
http://www.google.com/search?hl=en&q=proteomics+sequest+mascot++mudpit+OMSSA&btnG=Search
http://www.google.com/search?hl=en&q=de+novo+sequencing+peaks+sequit+lutefisk&btnG=Search
29