Crystal Structure EPrints: Publication @ Source Through

Download Report

Transcript Crystal Structure EPrints: Publication @ Source Through

‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community http://wiki.ecrystals.chem.soton.ac.uk

S.J. Coles a* , M.B. Hursthouse a , R.A. Stephenson a , P. Cliff b , E. Lyon b , M. Patel b J. Downing c & P. Murray-Rust c a School of Chemistry, University of Southampton, UK; b UKOLN, University of Bath, UK; c Unilever Centre for Molecular Informatics, University of Cambridge, UK.

The Data Publication Problem

UK funding councils recently stated that ‘

data underpinning the published results of publically-funded research should be made available as widely and rapidly as possible’

. Moreover, current bottlenecks in the publication of crystal structure data hinder the potential growth of databases (just 500,000 small unit cell crystal structures are available in the CSD, ICSD & CRYSMET, while it is estimated that at least three times this number have been determined in laboratories worldwide). In addition, publication in the mainstream literature still offers only indirect (and often subscription controlled) access to this data.

The eCrystals archive (pictured on the right) has been built to address this problem. On one hand the archive is capable of supporting and managing ALL the digital files generated during the course of a crystallographic experiment. On the other hand it is capable of acting as a publication tool, by making metadata relating to these crystallographic datasets available to the public domain. This process alters the traditional method of peer review by openly providing crystal structure data, where the reader or user may directly check correctness and validity.

The Federation Composition

Funded By

1.Institutions

: Universities of Southampton, Cambridge, Glasgow, Newcastle, Indiana (ReciprocalNet), Sydney, ARCHER and STFC represent institution based repositories. Partners have been selected on the basis of their significance in crystallography and represent a global multi-platform data network.

2.Scientists

: Practising chemists and crystallographers in the laboratory who create the crystal structures as part of their routine workflow.

3.Data centres

: CCDC is a professional body with a subject repository for crystal data and CDS is a national service that provides federated searching across chemistry databases and are considered as the primary data harvesters of eCrystals.

4.Publishers

: IUCr as the learned society representing crystallography is a publisher and also maintains standards for communicating and representing crystal structures. The RSC is a learned society and publisher and Chemistry Central is an emerging Open Access publisher.

5.Users

: scientists in related disciplines, students and other third parties who have a requirement to use crystallographic data as part of their research.

6.Advisory services

: the Digital Curation Centre will provide guidance on sustainability, preservation and policy matters.

7.Third party services

: these will develop across the repository federation infrastructure, providing linking and value added functionality.

Harvesting, Institutional Support & Third Party Services

Metadata relating to a dataset are made available to a public interface via a digital libraries protocol (OAI-PMH) that enables third parties to ‘harvest’ information on the content of the archive. Primary bibliographic data e.g. title (IUPAC name), authors & affiliation, in addition to chemical metadata e.g. International Chemical Identifier (InChI), empirical formula, compound class & keywords are provided. The dataset is registered with a persistent identifier (DOI) which enables the generation of a permanent citation. The OAI also states which aspects of the experimental process contain files, so that a harvester may assess whether an entry is appropriate for the desired purpose. Services can then ‘aggregate’ the metadata, -that is perform linking and cross referencing exercises that enable the researcher to navigate seamlessly through the literature. UKOLN (University of Bath) and the eCrystals team have designed a prototype service based on metadata harvested from the archive and aggregated it with the primary crystallographic literature (IUCr journals).

Future work in this area will focus on enabling data-based harvesters to automatically harvest datasets so that eCrystals entries can be indexed and incorporated into subject specific databases (e.g. CSD). Examples of these harvesters are CrystalEye

http://wwmm.ch.cam.ac.uk/crystaleye/

which will aggregate datasets with the broader chemical literature (IUCr and RSC journals) and the Chemical Database Service. Current developments include securing backing from host institutions and we are in the process of making an agreement with the University of Southampton to support this archive as part of its Institutional Repository scheme, hosted by our Library and Information Services department.