Presentazione di PowerPoint

Download Report

Transcript Presentazione di PowerPoint

WP4
Current status
Paolo Romano & WP4 group
VII EBRCN GM, Berlin, 26-27/09/2004
1
WP4 objectives
Improved accessibility and interconnection
• Links to external resources
• Literature, Sequence, Special interest databases
• Extracted databases
• Available at interested SRS sites
• Inventory of data and usage
• Local and remote search, sites’ map
VII EBRCN GM, Berlin, 26-27/09/2004
2
Links to external resources
Literature
Medline, Taxon
Special interest
Micro-organisms images
Plasmids’ maps
Sequences
EMBL Data Library
VII EBRCN GM, Berlin, 26-27/09/2004
3
Links to Medline
Syntax:
add [PMID: <number>] after bibliographic reference
Links in place (> 7000):
Plasmids: LMBP (375), NCCB (30)
Cell lines: ICLC (294), DSMZ (905)
Fungi: CBS (454)
Yeasts: CBS (1132)
Phages: NCCB (30)
Literature reference file: DSMZ (3818)
VII EBRCN GM, Berlin, 26-27/09/2004
4
Special interest databases
Plasmids’ maps:
Syntax:
New FDS field: ‘External_links map <name>’
Links in place:
Plasmids: LMBP (777)
Images of micro-organisms:
Syntax:
New FDS field: ‘External_links image <name>’
Links in place:
None (waiting for next catalogues’ update)
VII EBRCN GM, Berlin, 26-27/09/2004
5
Linking to EMBL (i)
• Linking “on-the-fly” to EMBL Data Library through
SRS, without IDs, gave negative results:
• Links are different for different materials and can use
various EMBL fields:
• Organism (micro-organisms), Division (viruses and plasmids),
Feature Table (definition of the source through Key, Qualifier,
Description)
• Annotation problems (e.g., missing spaces)
• Indexing problems (e.g., use of dots)
VII EBRCN GM, Berlin, 26-27/09/2004
6
Linking to EMBL (ii)
(well known) Example of search “on-the-fly”:
• Searching for fil. fungi strain CBS 100.20
Involves: fungi & source & cbs 100.20
( ( ([emblrelease-FtKey:source] &
[emblrelease-FtQualifier:strain] &
( ( [emblrelease-FtDescription:cbs] &
[emblrelease-FtDescription:100] ) |
[emblrelease-FtDescription:cbs100] ) &
[emblrelease-FtDescription:20]) )
< [emblrelease-Organism:fungi*] )
VII EBRCN GM, Berlin, 26-27/09/2004
7
Linking to EMBL (iii)
• Agreement with EBI
• Identification of crossreferences from CABRI
catalogues to EMBL (and viceversa) by unique IDs
• Submission of the list to EBI
• ID based links to CABRI included in EMBL data library
and distributed with it
• Use these links when linking from CABRI
• Links from LMBP to EMBL managed differently
VII EBRCN GM, Berlin, 26-27/09/2004
8
Linking to EMBL (iv)
• Work started vs EMBL 79
• Common (new) SRS site for CABRI and EMBL
• Modified indexing -> common keys format
• SRS links established
• Preliminar list of references sent to collections
• Comments returned
VII EBRCN GM, Berlin, 26-27/09/2004
9
Common site established
VII EBRCN GM, Berlin, 26-27/09/2004
10
Common keys format
CABRI indexing: by whole ID
CBS 100.20 -> ‘CBS 100.20’
EMBL indexing: by single words
CBS 100.20 -> ‘CBS’ + ‘100’ + ’20’
CBS100.20 -> ‘CBS100’ + ’20’
Common indexing: name (only letters), possibly followed by space, followed by
string (including letters, numbers, dot, dash), punctuation removed
CBS 100.20 -> ‘CBS10020’
CBS100.20 -> ‘CBS10020’
Special case (not currently managed):
NCCB LMD and Phabagen bacteria catalogues
VII EBRCN GM, Berlin, 26-27/09/2004
11
SRS links EMBL - CABRI
#links Embl to Cabri Bact & Fun & Yeasts
$Link: [from:$EMBLRELEASE_DB
to:$BCCM_LMG_DB
fromField:$DF_FtDescription
toField:$DF_CABRI_Strain_number]
$Link: [from:$EMBLRELEASE_DB
to:$CBS_BACT_DB
fromField:$DF_FtDescription
toField:$DF_CABRI_Strain_number]
VII EBRCN GM, Berlin, 26-27/09/2004
12
Automatic identification of links
VII EBRCN GM, Berlin, 26-27/09/2004
13
Custom views (i)
VII EBRCN GM, Berlin, 26-27/09/2004
14
Custom views (ii)
VII EBRCN GM, Berlin, 26-27/09/2004
15
Links to EMBL: current status
Almost ready for submission of the list of crossreferences
EBI objection: many, some little, databases, instead of a big one
New proposal from EBI
Links added in the SRS site at EBI only
Links not serchable
Links not distributed with EMBL Data Library
Alternative proposals from us
Making CABRI virtual catalogues by resource type (bacteria, cell lines,…)
Making an interrnediate database
VII EBRCN GM, Berlin, 26-27/09/2004
16
SRS virtual libraries
SRS Virtual libraries
Include many member libraries
Appear and can be searched as a unique database
Use indexes of member libraries
Member libraries must have a common data structure
CABRI Virtual libraries
Can be created for each resource type
Interconnected Bacteria DB
Interconnected Cell Lines DB
May be created for similar resource types
Interconnected Micro-organisms DB
VII EBRCN GM, Berlin, 26-27/09/2004
17
Intermediate database
Intermediate CABRI database would
Include very limited infomation: identification and name
Be linked by EMBL and link to the related CABRI catalogue
EMBL -> Intermediate db -> CABRI
Example:
Identification CIP 70.34
Name Acinetobacter baumannii
Identification ECACC 88020401
Name Vero
Identification LMG 3589
Name Bacillus subtilis (Ehrenberg 1835) Cohn 1872 AL
VII EBRCN GM, Berlin, 26-27/09/2004
18
Extracted databases
• Intended to improve accessibility of CABRI catalogues by
distributing them in a controlled frame
• Inlude a subset of information:
CABRI MDS + link to CABRI site (new field Full_details)
• Established agreement with EBI
• Preparation of extracted databases:
• Setting up of a purpose Web site: http://export.cabri.org/
• Setting up of an FTP site for distributing data and SRS
configuration files: ftp.cabri.org (not anonymous)
• Upload of catalogues to EBI: march 2004
• Automatic updating by FTP through SRS Prisma
VII EBRCN GM, Berlin, 26-27/09/2004
19
Catalogues at EBI
VII EBRCN GM, Berlin, 26-27/09/2004
20
CABRI views in place
VII EBRCN GM, Berlin, 26-27/09/2004
21
Link to CABRI for details & orders
VII EBRCN GM, Berlin, 26-27/09/2004
22
Quick searches at EBI (i)
VII EBRCN GM, Berlin, 26-27/09/2004
23
Quick searches at EBI (ii)
VII EBRCN GM, Berlin, 26-27/09/2004
24
Inventory of data usage and sets
• GlobalSearch on CABRI site available
• GlobalSearch on partners’ sites
• Not stable
• Partial (give me URLs!)
• Virtual BRCs’ Library
• Map of sites’ maps
• Includes links to archives/databases
• PLEASE SUBMIT YOUR DATA!
VII EBRCN GM, Berlin, 26-27/09/2004
25
That’all, folk!
•
MEDLINE
•
•
•
EMBL
•
•
•
Plasmids’ maps in place
Micro-organisms images ongoing
Extracted databases
•
•
•
•
Common site and index keys in place
Implementation of links under study with EBI staff
Other external links
•
•
•
Links to Medline already in place for many catalogues
New links added with periodical updates
Procedure implemented
Purpose web and ftp sites available
Uploaded to EBI march 2004
Inventory of data usage and data sets
•
•
Search on partners’ site contents (ht://dig) soon available
List of partner’s site contents (sort of “Map of sites’ maps”) under construction
VII EBRCN GM, Berlin, 26-27/09/2004
26
Thoughts about the future (i)
• CABRI as it is
• Many links to external databases are being set up and are already in place
for some of the catalogues
• Extracted databases have been uploaded to EBI
• Integration made possible (mainly) because of the adoption of SRS
• CABRI sites are now well known, appreciated and use network services
• GBIF perspective
• GBIF has designed a nice and innovative architecure
• Distributed architecture can help management by avoiding conversions
and updates
• It requires a sound expertise and good computer skills, not always
available at collections/BRCs
• The ABCD Schema is not adequate for catalogues’ contents
VII EBRCN GM, Berlin, 26-27/09/2004
27
Thoughts about the future (ii)
• We need to keep current and set up new links
• Current links with the molecular biology world should be kept
• SRS is an essential key for this connection
• Web Services based GBIF architecture must be taken into account for the
future links with the (quickly) evolving biodiversity information environment
• SRS is evolving
• Since SRS 6, XML has been incorporated
• With SRS 7, XML is essential (alternative to flat files)
• With SRS 8, Web Services have been added and SRS itself able to
provide Web Services and to access them remotely
VII EBRCN GM, Berlin, 26-27/09/2004
28
Thoughts about the future (iii)
• Proposal
• Start by extending the ABCD Schema to reach our needs
• Continue with SRS and follow its evolution
• Adopt as early as possible the new SRS Web Services features and start
offering information to GBIF
• Individual collections/BRCs willing to go autonomous can stop submission
of data, provided they offer an agreed interface for remote access by the
central SRS based system
• Finally, reach a mix distributed/centralized architecture, based on SRS
and offering both standard SRS services and Web Services
VII EBRCN GM, Berlin, 26-27/09/2004
29