Transcript ppt

Structure Validation Challenges in Chemical Crystallography

Ton Spek Utrecht University, The Netherlands.

Madrid, Aug. 26, 2011

Validation History

• Structure Validation of data supplied in computer readable CIF format was pioneered by Acta Cryst. C (Syd Hall et al., 1990ies).

• Initially the numerical checking of papers submitted to Acta C in CIF format was done by the Chester staff. • Subsequently automated checking of the CIF for data consistency, data completeness and validity was introduced ( checkCIF ) • PLATON facilities to check for Missed Symmetry and VOIDS were added later on.

• Soon followed by the inclusion of numerous other PLATON based tests (PLATxxx) of the reported structure (currently more than 400). checkcif/PLATON

FCF Validation

• Fo/Fc reflection file deposition and archival in CIF format (FCF) was made mandatory early on for Acta Cryst. papers.

• Useful for subsequent analysis of possibly unique data.

• CIF + FCF checking was added in 2010 into the IUCr CheckCIF/PLATON suite.

• Major chemical journals now require CIF deposition and validation reports but (not yet) the deposition of reflection data.

• The CCDC now accepts FCF's for deposition.

Why Automated Structure Validation • The large volume of new and routine structure reports submitted for publication.

• The limited number experienced and available crystallographic referees for validation.

• Detection of errors due to the black box use of crystallography by non-crystallographers.

• Setting standards of quality and reliability.

• Automated detection of unusual though not necessarily erroneous issues that need special attention ( reports.

ALERTS A,B,C,G ).

• Sadly: The need to Detect Frauded structure

Systematic Fraud

• A massive fraud was detected in late 2009 of structures mainly published around 2007 in Acta Cryst. E. (Soon 200 retractions !) • Nobody was prepared for serious and systematic fraud in this not competitive field of routine structures before 2010.

• Many deviations from the expected results can often be explained as errors, inexperience or due to poor data.

• Several retractions before 2010 might in hindsight concern frauded structures and not errors.

• Ongoing testing of our validation software on the archived data for structures published in Acta E often indicated suspect structures needing a more detailed investigation.

• It was only by following up on one of such a strange structure report with an analysis of all structures published by the authors of that paper that a fraud pattern emerged. • It was discovered that the same data set was used to publish a series if invented isomorphous structures. • Full story: Acta Cryst. E (2010) editorial and a Powerpoint Presentation of the E-section editor Jim Simpson (IUCr Website).

BogusVariations (with Hirshfeld ALERTS) on the Published Structure 2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM) OH=>NH2 NO2=>COOH H2O => NH3 OH => F

Fraud Detection Tools

• Generalized Hirshfeld Rigid Bond Test.

• CIF versus FCF data checking.

• Scatter Plots of the reflection data of the same or related structure(s).

• Look in Difference Maps for unusual features.

• SHELXL re-refinement using the supplied CIF & FCF data.

• Check in the CSD for related structures.

• Two case studies that illustrate the use of the above validation and analysis tools follow.

Example 1: Error or Fraud ?

Structure I Submitted to Acta Cryst. (2011)

PLATON Report Part 1

PLATON Report Part 2

RELATED STRUCTURE FROM THE CSD Structure II

Structure Report for II

Scatter Plots I(obs) versus I(calc) (I) (II)

Analysis

• Structure (II) has no validation issues.

• C-CH3 distance in (II) of 1.50 Ang. as expected.

• ‘C-F’ distance in (I) is 1.50 Ang. and not the expected 1.35 Ang.

• Conclusion: Structure (I) is the CH3 variety and not F.

• Data sets of (I) & (II) are not identical (see next).

• Data set (I) likely based on CH3 compound.

• Fraud or Error ? DIFABS file Error ?

• Authors of (I) confirmed Error believing external chemists proposal. Paper was retracted.

Scatter Plots of 2 Data Sets

Two Unrelated Data Sets Two Identical Data sets

CIF versus FCF data Check

• The R & S values in the three lines # R= should be identical within rounding error. • The reported and calculated residual density ranges should also be closely identical • This is the case in the first example but not in the second where the CIF & FCF data do not match.

Example 2: Iron(III) Complex

Fe(III) Validation Part 1

Fe(III) Validation Part 2

Example 2: Difference Density Map

Fe Structure Re-refined

Conclusion ?

• Structure now O.K. after an erratum ?

• Search for similar (isomorphous) structures in the CSD • Yes, there is an isomorphous Mn complex published by a different set of authors from a different university.

• Let us compare both structures.

Isomorphous Mn(III) Complex

Mn Structure Validation Part 1

Mn Validation Part 2

Scatter Plot Fe versus Mn I(obs)

Fe and Mn Data Sets Identical !

Analysis on Fe/Mn Structures

• The Displacement parameters in the CIF for the H2O molecule in the Fe complex are different from those used in the final refinement.

• Reflection sets identical for papers from two different sets of authors and location.

• CSD: Unusual coordination distances • Fraud or Error ? • Withdraw/Retract one or both ?

Validation Challenges

• Avoid False Positive and Negative ALERTS • Disordered structures (true or artifact) • Handling of Twinning (data names missing) • Powder structure validation (experts needed) • Incommensurate structure validation (experts) • Fabricated reflection data – Can we detect them • Education – What is the meaning of an ALERT • Should validation criteria be different for structures published in chemical journals ?

Concluding Remarks

• • PLATON includes a standalone Validation Tool. It is part of the WEB-based IUCr CheckCIF/PLATON Tool that is capably managed by Mike Hoyland (IUCr) • Validation is still a learning process.

• Chemical insight might be very helpful and often decisive as a validation tool.

Deposition of structure factors should be a requirement for all journals (The CCDC now accepts those along with the CIF)

Thanks To

• Martin Lutz and many others for taking the time to bring various unresolved issues to my attention with actual data.

• Send to [email protected]