The EBI search engine: EB-eye Franck Valentin External Services group EMBRACE Workshop CBS, BioCentrum-DTU, February 6-8, 2008 EBI is an Outstation of the European Molecular.

Download Report

Transcript The EBI search engine: EB-eye Franck Valentin External Services group EMBRACE Workshop CBS, BioCentrum-DTU, February 6-8, 2008 EBI is an Outstation of the European Molecular.

The EBI search engine: EB-eye
Franck Valentin
External Services group
EMBRACE Workshop
CBS, BioCentrum-DTU, February 6-8, 2008
EBI is an Outstation of the European Molecular Biology Laboratory.
Summary
•
•
•
•
2
The data at the EBI
What is the EB-eye?
A glance at the web interface
Web services for the EB-eye
07.11.2015
Web Services Course, CBS, DK
The data at the EBI
ID ...
AC ...
DT ...
ID ...
AC ...
DT ...
ID ...
AC ...
DT ...
<XML>
. . .
</XML>
Ligand
<XML>
. . .
</XML>
Array
Interpro
Express
<XML>
. . .
</XML>
<XML>
. . .
</XML>
ID
: ..
PARENT ID : ..
RANK
: ..
...
3
07.11.2015
Web Services Course, CBS, DK
<XML>
. . .
</XML>
<XML>
. . .
</XML>
<XML>
. . .
</XML>
The data at the EBI
•
Searching the data at the EBI
•
•
•
•
4
07.11.2015
Diversity and heterogeneity of the data (format, size, content…)
Most of the data providers have their own search mechanism
Heterogeneity of the search results (display, content,
granularity…)
Navigation between the different resources (references) not
consistent
Web Services Course, CBS, DK
What is the EB-eye?
• Global search mechanism
• Searches most of the EBI resources in one go
• Not specific to any resource
• Unified searches of the EBI resources
• Free-text search (unified semantic)
• Basic results display (Google-like)
• Simple cross reference navigation
• Available on all the EBI web pages
5
07.11.2015
Web Services Course, CBS, DK
A glance at the web interface
6
07.11.2015
Web Services Course, CBS, DK
EB-eye results summary page
•
•
•
•
7
07.11.2015
Web Services Course, CBS, DK
Organized into categories
called “domains”
Number of results per domain
Refine your search
Expand/Collapse for more
details
EB-eye domain result page
•
Results for all the resources in a domain
•
•
•
•
Hierarchy of domains
•
•
8
07.11.2015
A domain can contain several resources
First 3 entries displayed for each resource
View more entries for a particular resource
Forward search (smaller set of resources)
Backward search (wider set of resources)
• Refine your search
• Navigate the results pages
Web Services Course, CBS, DK
EB-eye domain result page (one resource)
•
•
•
•
9
07.11.2015
Web Services Course, CBS, DK
Basic information: ID, name, description…
Link to the main resource web site
Additional links
EB-eye internal references
EB-eye cross-references navigation
•
•
•
10
07.11.2015
Web Services Course, CBS, DK
Navigate inside the EB-eye
References context
Navigation…
•
Using resources explicit references
•
Using resources implicit references
EB-eye Advanced Search
•
•
•
11
07.11.2015
Web Services Course, CBS, DK
Accessible from all the pages
Simple search criteria
Domain specific search
•
Domain selection
•
Fields selection
•
References
Web services for the EB-eye
•
Simple experimental API for basic operations
•
•
•
•
12
Basic metadata information
Basic queries (Full-text and entries)
Limited cross-references navigation
Depending on the usage, we may implement a more
complex API and more functionalities
07.11.2015
Web Services Course, CBS, DK
Web services – Listing the domains
List available domains
(list only the leaves)
String[] listDomains()
> listDomains()
…
astd
…
ensembl
emblcds
embldeleted
emblnew_ann_con
emblnew_con
emblnew_standard
emblnew_wgs
emblrelease_ann_con
emblrelease_con
emblrelease_standard
emblrelease_wgs
ensembl
…
13
07.11.2015
Web Services Course, CBS, DK
Web services – Number of results
Get number of results for a simple query
int getNumberOfResults(String domain, String query)
> getNumberOfResults(‘medline’, 'immunolog* nutrition')
6954
14
07.11.2015
Web Services Course, CBS, DK
Web services – Get results ids
List result IDs for a simple query
String[] getResultsIds(String domain, String query)
String[] getResultsIds(String domain, String query, int start, int size)
> getResultsIds(‘uniprot’, ‘polymerase’, 0, 5)
A2VB99_9VIRU
Q86777_9CALI
Q779J8_9VIRU
Q8I944_9STIC
Q8I945_9STIC
15
07.11.2015
Web Services Course, CBS, DK
Web services – Get referenced domains
Get referenced domains in a domain or an entry
String[] getDomainsReferencedInEntry(String domain, String entryId)
String[] getDomainsReferencedInDomain(String domain)
> getDomainsReferencedInEntry(‘ensembl’, ‘cg2102’)
embldeleted
emblnew_ann_con
emblnew_con
emblnew_standard
emblnew_wgs
emblrelease_ann_con
emblrelease_con
emblrelease_standard
emblrelease_wgs
go
taxonomy
uniprot
16
07.11.2015
Web Services Course, CBS, DK
Web services – Get referenced entries
Get referenced entries for a domain in a particular entry
String[] getReferencedEntries(String domain, String entryId,
String referencedDomain)
getReferencedEntries(‘ensembl’, ‘cg2102’, ‘go’)
GO:0005634 GO:0046872 GO:0008270 GO:0016319 GO:0003676 GO:0003677
GO:0045892 GO:0006350 GO:0006355 GO:0007275 GO:0007399 GO:0007402
GO:0007417 GO:0007419 GO:0003700 GO:0009791 GO:0030154
17
07.11.2015
Web Services Course, CBS, DK
Web services – External cross-references
List non EB-eye domains referenced in a domain
String[] listAdditionalReferenceFields(String domain)
listAdditionalReferenceFields(‘msdpdb’)
CATH
PFAM
SCOP
18
07.11.2015
Web Services Course, CBS, DK
Web services – The fields
XML files
Flat files
ID AF030562; SV 1; linear; genomic DNA; STS; FUN; 852 BP.
AC AF030562;
DT 04-DEC-1997 (Rel. 53, Created)
DT 03-MAR-2000 (Rel. 62, Last updated, Version 2)
XX
DE Fusarium venenatum clone VEN-A RAPD band generated using
Operon primer
DE OPW-03, sequence tagged site.
...
id (value stored)
<MedlineCitationSet>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID>10997935</PMID>
<DateCreated>
<Year>2000</Year>
<Month>10</Month>
<Day>04</Day>
</DateCreated>
…
Db
ID AF030562; SV 1; linear; genomic DNA; STS; FUN; 852 BP.
XX
acc (value stored)
id (value stored)
<MedlineCitationSet>
AC AF030562;
<MedlineCitation
Owner="NLM"
Status="MEDLINE">
creation_date /last_modificationdate (values non stored)
XX
<PMID>14216186</PMID>
DT 04-DEC-1997 (Rel. 53, Created)
<DateCreated>
DT 03-MAR-2000 (Rel. 62, Last updated, Version 2)
description (value stored)
<Year>1965</Year>
<database> (value non stored)
creation_date
XX
<name>IntAct.Experiment</name>
<Month>02</Month>
<description>Experimental procedures that allowed to…</description>
DE Fusarium venenatum clone VEN-A RAPD band generated using Operon primer
<release>1.0</release>
<Day>01</Day>
<release_date>2007-Feb-16</release_date>
DE OPW-03, sequence tagged site.
<entry_count>5697</entry_count>
</DateCreated>
XX
<entries>
<DateCompleted>
<entry id="EBI-77680">
KW STS.
…
<Year>1996</Year>
last_modification_date
(value non stored)
organism_species
(value
non
stored)
XX
<Month>12</Month>
organism_classification (value non stored)
OS Fusarium venenatum
<Day>01</Day>
OC Eukaryota; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes;
</DateCompleted>
OC Hypocreomycetidae; Hypocreales; mitosporic Hypocreales; Fusarium.
<DateRevised>
XX
<Year>2007</Year>
RN [1]
<Month>03</Month>
RP 1-852
<Day>01</Day>
RA Yoder W.T., Christianson L.M.;
</DateRevised>
RT "Species-specific primers resolve members of the section Fusarium.
<Article PubModel="Print">
RT Taxonomic status of the edible 'Quorn' fungus re-evaluated";
issn (value non stored)
<Journal>
RL Fungal Genet. Biol. 0:0-0(1997).
<ISSN IssnType="Print">0009-8981</ISSN>
XX
<JournalIssue CitedMedium="Print">
references (non stored)
RN [2]
<Volume>10</Volume>
RP 1-852
volume (value stored)
<PubDate>
RA Yoder W.T., Christianson L.M.;
<Year>1964</Year>
RT ;
<Month>Jul</Month>
RL Submitted (21-OCT-1997) to the EMBL/GenBank/DDBJ databases.
</PubDate>
RL Microbiology, Novo Nordisk Biotech, Inc., 1445 Drew Ave., Davis, CA 95616,
name (value non stored)
</JournalIssue>
RL USA
<Title>Clinica chimica acta; international journal of clinical chemistry</Title>
XX
<ISOAbbreviation>Clin. Chim. Acta</ISOAbbreviation>
FH Key
Location/Qualifiers
</Journal>
FH
...
FT source
1..852
...
FT
/organism="Fusarium venenatum"
FT
/strain="ATCC20334"
...
Dump file (XML)
19
07.11.2015
Web Services Course, CBS, DK
Web services – The fields
List available (stored) fields in a domain
String[] listFields(String domain)
listFields(‘uniprot’)
acc_number
description
id
name
20
07.11.2015
Web Services Course, CBS, DK
Web services – Get results with fields
List result fields values for a simple query
String[][] getResults(String domain, String query, String[] fields,
int start, int size)
>getResults(‘uniprot’, ‘polymerase’, [‘acc’, ‘id’, ‘description’], 0, 5)
acc
description
id
------------------------------------------------------------------A2VB99
Polymerase.
A2VB99_9VIRU
Q86777
RNA polymerase (Fragment).
Q86777_9CALI
Q779J8
Q0E5A0
DNA polymerase (EC 2.7.7.7).
Q779J8_9VIRU
Q8I944
DNA polymerase (EC 2.7.7.7).
Q8I944_9STIC
21
07.11.2015
Web Services Course, CBS, DK
Web services – Get result fields values for entries
Get result fields values for one or several entries
String[] getEntry(String domain, String entryId, String[] fields)
String[][] getEntries(String domain, String[] entryIds, String[] fields)
>getEntry(‘medline’, ‘7605758’, [‘description’, ‘publication_date’ , ‘authors’])
description :
BACKGROUND AND OBJECTIVES: Intraspinally administered alpha 2-adrenergic
agonists produce analgesia in part by causing spinal acetylcholine and nitric
oxide (NO) release. Clonidine-induced analgesia is enhanced by subarachnoid
neostigmine and inhibited by N-methyl-L-arginine (NMLA), a blocker of NO
synthesis. The authors tested whether dexmedetomidine, an alpha [...]
publication_date
1995 Mar-Apr
:
authors :
Bouaziz H.
Hewitt C.
Eisenach J.C.
22
07.11.2015
Web Services Course, CBS, DK
Web services – Get the urls
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-+[UNIPROT:Q9QUZ9_9MURI]+-newId
returns the urls configured for a field of an entry
String[]
getEntryFieldUrls(String domain, String entry, String[] fields)
String[][] getEntriesFieldUrls(String domain, String[]entries, String[]fields)
getEntryFieldUrls(‘uniprot’, ‘Q9QUZ9_9MURI’, [‘id’])
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[UNIPROT:Q9QUZ9_9MURI]+-newId
23
07.11.2015
Web Services Course, CBS, DK
Web services – Referenced entries from a domain
List of referenced entries from a domain referenced in a set of entries
String[][] getReferencedEntriesFlatSet(String domain, String[] entries,
String referencedDomain, String[] fields)
dict(String[][]) getReferencedEntriesSet(String domain, String[] entries,
String referencedDomain, String[] fields)
>getReferencedEntriesSet(‘ensembl’, [‘AAEL005345’, ‘CG2102’],
‘go’, [‘id’, ‘name’])
‘AAEL005345’->
[GO:0016319,
[GO:0045892,
[GO:0007417,
[GO:0009791,
‘CG2102’->
[GO:0005634,
[GO:0046872,
[GO:0008270,
[GO:0016319,
[GO:0003676,
[GO:0003677,
...
24
07.11.2015
‘mushroom body development’],
‘negative regulation of transcription,DNA-dependent’],
‘central nervous system development’],
‘post-embryonic development’]
‘nucleus’],
‘metal ion binding’],
‘zinc ion binding’],
‘mushroom body development’],]
‘nucleic acid binding’],
‘DNA binding,
Web Services Course, CBS, DK
Web services – Links
•
WSDL:
 http://www.ebi.ac.uk/ebisearch/service.ebi?wsdl
•
Documentation:
 http://www.ebi.ac.uk/Tools/webservices/services/eb-eye
•
Feedback!
 http://www.ebi.ac.uk/support/
25
07.11.2015
Web Services Course, CBS, DK
Web services – Let’s play !
• 2 wrappers to hide the SOAP hassle
•
•
•
Test files to play with
•
•
26
EBeyeWSWrapper.pm
EBeyeWSWrapper.py
07.11.2015
testEBeyeWSWrapper.pl
testEBeyeWSWrapper.py
Web Services Course, CBS, DK