Presentazione di PowerPoint

Download Report

Transcript Presentazione di PowerPoint

WP4
Current status
Paolo Romano & WP4 group
VI EBRCN GM, Paris, 10-11/12/2003
1
WP4: Linking to Medline (i)
Retrieval of PUBMED IDs still ongoing
• MK amended DSMZ literature db
• FG supporting BCCM, manual insertion after
restructuring of dbs
• GS retrieving PMIDs for CBS catalogues (not only
Medline)
• PR retrieving PMIDs for NCCB and CIP catalogues
• Undefined CABI, NCIMB (updates?)
• ECACC?
VI EBRCN GM, Paris, 10-11/12/2003
2
WP4: Linking to Medline (ii)
Catalogue structures
• ongoing (collections’ tasks)
• done for ICLC, BCCM/LMBP, DSMZ (maybe other
as well)
CABRI guidelines and SRS configuration files
• Catalogue Production Guidelines revision ongoing
• done for cell lines, plasmids, DSMZ literature
• SRS structure and syntax files are being updated
as catalogues are submitted
VI EBRCN GM, Paris, 10-11/12/2003
3
WP4: Linking to Medline (iii)
Catalogues (with link to Medline) updated
 ICLC: September 2003 (revision)
 BCCM/LMBP: December 2003 (revision)
 DSMZ: November (June) 2003
 Other BCCM, NCCB plasmids: at next catalogue
update
 All catalogues: according to collections’ plans
VI EBRCN GM, Paris, 10-11/12/2003
4
Linking to EMBL (i)
• Linking “on-the-fly” to EMBL Data Library through
SRS, without IDs, gave negative results:
• Links are different for different materials and can use
various EMBL fields:
• Organism (micro-organisms), Division (viruses and plasmids),
Feature Table (definition of the source through Key, Qualifier,
Description)
• Annotation problems (e.g., missing spaces)
• Indexing problems (e.g., use of dots)
VI EBRCN GM, Paris, 10-11/12/2003
5
Linking to EMBL (ii)
(well known) Example of search “on-the-fly”:
• Searching for fil. fungi strain CBS 100.20
Involves: fungi & source & cbs 100.20
( ( ([emblrelease-FtKey:source] &
[emblrelease-FtQualifier:strain] &
( ( [emblrelease-FtDescription:cbs] &
[emblrelease-FtDescription:100] ) |
[emblrelease-FtDescription:cbs100] ) &
[emblrelease-FtDescription:20]) )
< [emblrelease-Organism:fungi*] )
VI EBRCN GM, Paris, 10-11/12/2003
6
Linking to EMBL (iii)
• Identify crossreferences for linking from CABRI
catalogues to EMBL (and viceversa) by unique
IDs (single run for all CABRI records)
• Many EMBL records can be linked to a single
CABRI item
• Add links in EMBL and use these links when
linking from CABRI (fast and effective search by
SRS)
• ID based links to CABRI included in EMBL data
library and distributed with it
VI EBRCN GM, Paris, 10-11/12/2003
7
Linking to EMBL (iv)
• Agreement with EBI (list of crosserefs)
• Work do be done (vs EMBL 77) after uploading of
CABRI extracted catalogues to EBI: early 2004
• Crossreferences returned to collections (no
obligations to add links to catalogues)
• Possible well known “wrong” EMBL sequence
removed from table
• Links from plasmids catalogues to EMBL
managed differently (using current remarks)
VI EBRCN GM, Paris, 10-11/12/2003
8
Linking to other sources
• Links to plasmids’ maps (BCCM/LMBP) by a
purpose field: December 2003
• Images of micro-organisms (CBS & BCCM) linked
from a new field: starting from next updates
• Enzyme and biochemical pathways (cell lines,
microorganisms): under development
• Further links (nomenclature, acronyms, genes)
still under analysis
• Interconnected Biological Resource Database
VI EBRCN GM, Paris, 10-11/12/2003
9
Extracted databases
• Possible since availability of the new site (SRS 7)
• Selected meaningful subset of information:
CABRI MDS + link to CABRI site (new field Full_details)
• Established agreement with EBI
• Preparation of extracted databases:
• Focus on bacteria, fungi & yeasts, human and animal cell lines
• Setting up of a purpose Web site: http://export.cabri.org/
• Setting up of an FTP site for distributing data and SRS
configuration files: ftp.cabri.org (not anonymous)
• Upload of catalogues to EBI: early 2004
• Automatic updating by EBI by FTP through SRS Prisma
VI EBRCN GM, Paris, 10-11/12/2003
10
Inventory of data usage and sets
• GlobalSearch on partners’ sites
• ht://Dig can be used to index all partners’ site and
search their contents in a unique step (only static files,
not searchable archives/databases)
http://srs711.cabri.org/htdig/index-ebrcn.html
• Virtual BRCs’ Library (W3C Virtual Library)
• List of data sets, by category, with links to information
sources (Map of sites’ maps)
• Includes links to archives/databases
VI EBRCN GM, Paris, 10-11/12/2003
11
Summary
•
MEDLINE
•
•
•
•
EMBL
•
•
Plasmids’ maps and micro-organisms images ongoing
Other under study
Extracted databases
•
•
•
•
No news, waiting for extracted catalogues available at EBI
Other external links
•
•
•
Links to Medline already in place for many catalogues
New links added with periodical updates
New records include PUBMED ID
Available purpose web and ftp sites
Focus on bacteria, fungi & yeasts, human and animal cell lines
Uploading to EBI planned early 2004
Inventory of data usage and data sets
•
•
Search on partners’ site contents (ht://dig)
List of partner’s site contents (sort of “Map of sites’ maps”, including dbs)
VI EBRCN GM, Paris, 10-11/12/2003
12
Thoughts about the future (i)
• CABRI as it is
• Many links to external databases are being set up and are already in place
for some of the catalogues
• Extracted databases will soon be uploaded to EBI
• Integration made possible (mainly) because of the adoption of SRS
• CABRI sites are now well known, appreciated and use network services
• GBIF perspective
• GBIF has designed a nice and innovative architecure
• Distributed architecture can help management by avoiding conversions
and updates
• It requires a sound expertise and good computer skills, not always
available at collections/BRCs
• The ABCD Schema is not adequate for catalogues’ contents
VI EBRCN GM, Paris, 10-11/12/2003
13
Thoughts about the future (ii)
• We need to keep current and set up new links
• Current links with the molecular biology world should be kept
• SRS is an essential key for this connection
• Web Services based GBIF architecture must be taken into account for the
future links with the (quickly) evolving biodiversity information environment
• SRS is evolving
• Since SRS 6, XML has been incorporated
• With SRS 7, XML is essential (alternative to flat files)
• With SRS 8, Web Services will be added and SRS itself will be able to
provide Web Services and access them remotely
VI EBRCN GM, Paris, 10-11/12/2003
14
Thoughts about the future (iii)
• Proposal
• Start by extending the ABCD Schema to reach our needs
• Continue with SRS and follow its evolution
• Adopt as early as possible the new SRS Web Services features and start
offering information to GBIF
• Individual collections/BRCs willing to go autonomous can stop submission
of data, provided they offer an agreed interface for remote access by the
central SRS based system
• Finally, reach a mix distributed/centralized architecture, based on SRS
and offering both standard SRS services and Web Services
VI EBRCN GM, Paris, 10-11/12/2003
15