Transcript ppt
Structure Validation Challenges in Chemical Crystallography
Ton Spek Utrecht University, The Netherlands.
Madrid, Aug. 26, 2011
Validation History
• Structure Validation of data supplied in computer readable CIF format was pioneered by Acta Cryst. C (Syd Hall et al., 1990ies).
• Initially the numerical checking of papers submitted to Acta C in CIF format was done by the Chester staff. • Subsequently automated checking of the CIF for data consistency, data completeness and validity was introduced ( checkCIF ) • PLATON facilities to check for Missed Symmetry and VOIDS were added later on.
• Soon followed by the inclusion of numerous other PLATON based tests (PLATxxx) of the reported structure (currently more than 400). checkcif/PLATON
FCF Validation
• Fo/Fc reflection file deposition and archival in CIF format (FCF) was made mandatory early on for Acta Cryst. papers.
• Useful for subsequent analysis of possibly unique data.
• CIF + FCF checking was added in 2010 into the IUCr CheckCIF/PLATON suite.
• Major chemical journals now require CIF deposition and validation reports but (not yet) the deposition of reflection data.
• The CCDC now accepts FCF's for deposition.
Why Automated Structure Validation • The large volume of new and routine structure reports submitted for publication.
• The limited number experienced and available crystallographic referees for validation.
• Detection of errors due to the black box use of crystallography by non-crystallographers.
• Setting standards of quality and reliability.
• Automated detection of unusual though not necessarily erroneous issues that need special attention ( reports.
ALERTS A,B,C,G ).
• Sadly: The need to Detect Frauded structure
Systematic Fraud
• A massive fraud was detected in late 2009 of structures mainly published around 2007 in Acta Cryst. E. (Soon 200 retractions !) • Nobody was prepared for serious and systematic fraud in this not competitive field of routine structures before 2010.
• Many deviations from the expected results can often be explained as errors, inexperience or due to poor data.
• Several retractions before 2010 might in hindsight concern frauded structures and not errors.
• Ongoing testing of our validation software on the archived data for structures published in Acta E often indicated suspect structures needing a more detailed investigation.
• It was only by following up on one of such a strange structure report with an analysis of all structures published by the authors of that paper that a fraud pattern emerged. • It was discovered that the same data set was used to publish a series if invented isomorphous structures. • Full story: Acta Cryst. E (2010) editorial and a Powerpoint Presentation of the E-section editor Jim Simpson (IUCr Website).
BogusVariations (with Hirshfeld ALERTS) on the Published Structure 2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM) OH=>NH2 NO2=>COOH H2O => NH3 OH => F
Fraud Detection Tools
• Generalized Hirshfeld Rigid Bond Test.
• CIF versus FCF data checking.
• Scatter Plots of the reflection data of the same or related structure(s).
• Look in Difference Maps for unusual features.
• SHELXL re-refinement using the supplied CIF & FCF data.
• Check in the CSD for related structures.
• Two case studies that illustrate the use of the above validation and analysis tools follow.
Example 1: Error or Fraud ?
Structure I Submitted to Acta Cryst. (2011)
PLATON Report Part 1
PLATON Report Part 2
RELATED STRUCTURE FROM THE CSD Structure II
Structure Report for II
Scatter Plots I(obs) versus I(calc) (I) (II)
Analysis
• Structure (II) has no validation issues.
• C-CH3 distance in (II) of 1.50 Ang. as expected.
• ‘C-F’ distance in (I) is 1.50 Ang. and not the expected 1.35 Ang.
• Conclusion: Structure (I) is the CH3 variety and not F.
• Data sets of (I) & (II) are not identical (see next).
• Data set (I) likely based on CH3 compound.
• Fraud or Error ? DIFABS file Error ?
• Authors of (I) confirmed Error believing external chemists proposal. Paper was retracted.
Scatter Plots of 2 Data Sets
Two Unrelated Data Sets Two Identical Data sets
CIF versus FCF data Check
• The R & S values in the three lines # R= should be identical within rounding error. • The reported and calculated residual density ranges should also be closely identical • This is the case in the first example but not in the second where the CIF & FCF data do not match.
Example 2: Iron(III) Complex
Fe(III) Validation Part 1
Fe(III) Validation Part 2
Example 2: Difference Density Map
Fe Structure Re-refined
Conclusion ?
• Structure now O.K. after an erratum ?
• Search for similar (isomorphous) structures in the CSD • Yes, there is an isomorphous Mn complex published by a different set of authors from a different university.
• Let us compare both structures.
Isomorphous Mn(III) Complex
Mn Structure Validation Part 1
Mn Validation Part 2
Scatter Plot Fe versus Mn I(obs)
Fe and Mn Data Sets Identical !
Analysis on Fe/Mn Structures
• The Displacement parameters in the CIF for the H2O molecule in the Fe complex are different from those used in the final refinement.
• Reflection sets identical for papers from two different sets of authors and location.
• CSD: Unusual coordination distances • Fraud or Error ? • Withdraw/Retract one or both ?
Validation Challenges
• Avoid False Positive and Negative ALERTS • Disordered structures (true or artifact) • Handling of Twinning (data names missing) • Powder structure validation (experts needed) • Incommensurate structure validation (experts) • Fabricated reflection data – Can we detect them • Education – What is the meaning of an ALERT • Should validation criteria be different for structures published in chemical journals ?
Concluding Remarks
• • PLATON includes a standalone Validation Tool. It is part of the WEB-based IUCr CheckCIF/PLATON Tool that is capably managed by Mike Hoyland (IUCr) • Validation is still a learning process.
• Chemical insight might be very helpful and often decisive as a validation tool.
Deposition of structure factors should be a requirement for all journals (The CCDC now accepts those along with the CIF)
Thanks To
• Martin Lutz and many others for taking the time to bring various unresolved issues to my attention with actual data.
• Send to [email protected]