No Slide Title

Download Report

Transcript No Slide Title

NMR is a powerful technique to elucidate unknown compounds, to verify proposed structures, and to identify known compounds. This poster discusses fairly automated approaches for these tasks implemented by the NMRanalyst™ software modules AssembleIt™, VerifyIt™, and FindIt™.

Most organic molecules are sufficiently protonated for elucidation by the sensitive indirect detection method. This leaves the acquisition of 1D carbon and 1D nitrogen spectra as the most time consuming step. NMRanalyst can determine equivalent 1D carbon and nitrogen information from the F1 frequency of HSQC and HMBC correlations instead. Using spin system modeling, such frequencies are obtainable up to a fifth of the F1 resolution of the 2D spectrum used.

The lasalocid example below demonstrates the identification of a sample using FindIt. The National Cancer Institute (NCI) assembled over 200,000 biologically active molecular structures. These molecules can be searched from

http://cactvs.cit.nih.gov

and their biological test results are online as well. FindIt identifies the best matching NCI structures for available proton and carbon NMR data.

This identification typically takes below one minute. Additional molecular structures to consider can be added to the FindIt database or a different structural database can be used.

Over 8 million small organic molecules have been published as shown by assigned CAS numbers.

Even this is a tiny fraction of stable small organic molecules. Sometimes a complete structure elucidation for an unknown sample is required. Our second example below illustrates the AssembleIt structure elucidation of dihydrotestosterone based solely on NMR data.

Rapid Sample Identification: NMRanalyst Dereplication, Verification, and Structure Elucidation

Reinhard Dunkel

*

and Xinzi Wu ScienceSoft LLC, Sandy, UT

Data Set 1 H & 13 C

1-Indanone 2-Ethyl-1-indanone 1 1 Brucine Cortisone 3 1 Dihydrotestosterone 2 Gibberellic Acid Lasalocid Sodium Salt Menthol Prednisone Quinine Strychnine Sucrose Taxol Verbenol 1 1 1 1 1 1 1 5 1 This table shows the FindIt results for the compounds we tested. Most of the time the correct structure is the best rated FindIt structure. In all cases the correct structure is within the top five reported ones. A booklet of the identified top ten structures for all the compounds is attached to this poster.

The FindIt structure rating is actually generated by VerifyIt. FindIt uses VerifyIt to match NMR data with its database structures. Besides a single rating score, VerifyIt can generate detailed information on the consistency between available NMR data and a proposed structure.

Proton-Proton Coupling Networks From DQF-COSY FindIt: Lasalocid Sample Identification Spectrometer:

INOVA 800 MHz

Sample Concentration:

ca. 2 µmol (1.2 mg)

Probe:

Cold Probe

Provided by: Solvent:

CD 3 OD Dr. Ronald Crouch, Varian Inc.

1D Proton, gHSQC, and gHMBC Spectra With Analysis Results 1D

1

H spectrum gHSQC gHMBC AssembleIt: Dihydrotestosterone Structure Elucidation Spectrometer:

AP 600 MHz

Sample Concentration:

ca. 200 µg

Probe:

1mm MicroProbe

Provided by: Solvent:

DMSO-d 6 Dr. Till Kühn, Bruker Switzerland (some H 2 O)

Directory npF2 NS swF2 spF2 npF1 swF1 spF1 time

1D proton (1) 32768 1 12376.24 -2482.69 4 s gHSQC (5) 512 16 4595.59 -123.29 256 25000.00 -1188.44 2 h 33 m gHMBC (6) 2048 32 5387.93 -519.46 512 37735.85 -2271.10 10 h 17 m gDQF-COSY (7) 1024 2 4595.59 -123.29 512 4595.59 -123.29 1 h 23 m When FindIt cannot identify a molecular structure, AssembleIt can be used for a complete structure elucidation. We recommend acquiring a set of 1D proton, gHSQC, gHMBC, and gDQF-COSY spectra. The 1,1-ADEQUATE spectrum type is supported. But an order of magnitude lower sensitivity discourages its use. The acquisition time used for this dihydrotestosterone example is an overkill, but provides nice spectral plots for illustrative purposes.

Carbon-proton bonds and carbon multiplicities for the structure elucidation are determined from the edited HSQC. Longer-range correlations are determined by HMBC. For molecules containing nitrogen (e.g., the brucine, quinine, and strychnine data sets above), an additional 15 N-HMBC should be acquired. To reduce assignment ambiguities, bonds between protonated carbons can be identified unambiguously from vicinal DQF-COSY correlations.

NMRanalyst transforms and analyzes standard NMR data. Its AssembleIt module exhaustively combines detected correlations to derive likely molecular structures. Unobserved 2-bond HMBC correlations are automatically added when derivable from observed longer-range correlations.

Carbon chemical shift prediction identifies positions and likely types of NMR unobserved heteroatoms.

1D Proton Spectrum before software water subtraction Long Range Proton-Carbon Connectivities From HMBC Determined Vicinal DQF-COSY With HSQC Connectivities

A solid line between two protonated carbons represents an unambiguous carbon-carbon bond.

Dashed lines represent ambiguous ones resulting from ambiguous HSQC assignments.

AssembleIt Derived Carbon-Carbon & Carbon-Heteroatom H Correlations

The saying “garbage in, garbage out” also holds for NMR. A spectroscopist carefully inspects a spectral area before claiming that it contains a correlation. “Peak Picking” is too primitive for a reliable NMR data analysis. NMRanalyst uses simultaneous fitting of a spin system in all acquired phase components. Spectral figures above mark identified correlation areas as bounding boxes.

NMRanalyst calculates simulated spectral areas for the fitting and they can be saved as a simulated spectrum. Spin system fitting minimizes the “garbage in” part of the automated spectral analysis.

AssembleIt Derived Structure With NMRgraph Added Oxygens (Only One Possible Structure) Best 10 Structures Reported by FindIt for Identified Proton and Carbon Shifts

1: 0.556293 (177406) 2: 0.369931 (374130) 3: 0.366400 (674675) 4: 0.365283 (668555) 5: 0.357406 (660369) 6: 0.351521 (674674) 7: 0.337637 (282179) 8: 0.337263 (262642) 9: 0.329659 ( 9162) 10: 0.329215 (622258) The best matching structure with a rating of 0.556293 is the NCI compound 177406, lasalocid. FindIt reports these structures as shown below. The top left structure is the most likely one.

Carbon Multiplicity Determination From Edited HSQC Determined HSQC Correlations

The “?” labels identify unobserved but derived bonds.

Gray bonds to NMR unobservable heteroatoms are added to minimize the disagreement between observed and predicted carbon shifts. Other bond labels show the proton frequency over which the bond was detected.

AssembleIt’s generative structure elucidation is uncommon, perhaps even novel. NMR cannot determine a molecular formula (MF), which is often unknown. This generative approach is similar to the human structure elucidation. But it is exhaustive and typically faster than MF based molecular generators. Most AssembleIt structure elucidations result in several possible structures which are then ordered in agreement of predicted with observed chemical shifts. A chemical shift reflects the chemical environment of a nucleus. Hence the agreement of predicted to observed shifts is a good indication for the correct structure.

HMBC correlations are highly ambiguous in their interpretation. Is an HMBC correlation a 2-bond, 3-bond, or n-bond proton-carbon coupling? It’s not clear. But during the automated structure generation, most ambiguities are resolved. Except for the taxol and lasalocid data sets above, AssembleIt produced the complete structure elucidations for the compounds we tested. The AssembleIt structure generation typically takes under one minute and less than 100 possible structures result.

NMRanalyst 3.2 used for this poster has been tested on RH Linux (8.0 and 9), MS Windows (98SE, ME, 2000, and XP), and Sun SPARC workstations running Solaris (8 or 9).

For commercial availability, please ask or see

http://www.ScienceSoft.net

.

Center atoms are carbon shifts in ppm surrounded by correlated proton frequencies.

Green represents CH, blue CH 2 , and orange CH 3 and H.

Acknowledgements

Varian Inc., Bruker Biospin, and the University of Mainz contributed the NMR data sets. This work was made possible by NIH SBIR Phase II 5 R44 MH061652 funding.

http://www.ScienceSoft.net