SIMDAT AGM06

Download Report

Transcript SIMDAT AGM06

IXodus
a knowledge discovery process based
on the SIMDAT-Pharma GRID
technologies
Richard Kamuzinzi
Université Libre de Bruxelles – Bioinformatics
June, 5 – 7th 2007
World Wide Workflow GRID ASIA 2007
Singapore
SIMDAT
SIMDAT Facts
• EU Information Society Technologies (IST)
• GRID Project
• Duration: 4 years
• Start date: September 1st 2004
• 26 partners
SIMDAT
Scope
• Product and Process Development
(automobiles, aircraft, drugs,
meteorological services) is
– Complex
– Involves several independent
organizations at different locations
• Complexity management in one site is too
expensive => cost/risk sharing with
partners => GRID
SIMDAT
Strategic objectives
• to test and enhance Data Grid technology for
product development and production process
design,
• to develop federated versions of problemsolving environments by leveraging enhanced
Grid services,
• to exploit Data Grids as a basis for distributed
knowledge discovery,
• to promote defacto standards for these
enhanced Grid technologies across a range of
disciplines and sectors as well as
• to raise awareness of the advantages of Data
Grids in important industrial sectors
SIMDAT
Project organization (SIMDAT-Pharma)
NEC, GSK, Inpharmatica, ULB,
Fraunhofer SCAI-Bio and UKA
SIMDAT
IXodus – The scientific problem
• Lyme disease: significant source of human and
animal pathology in temperate areas of the world
(identified in 90s)
• Caused by the bite of a tick of genus IXodes,
infected by the pathogen bacterium Borrelia
burgdorferi
• the study of host-parasite interactions is an
active research as ~20% ticks have been found
infected by the bacterium
• IXodus scientific protocol: designed to deal
with characterisations of genes expressed in the
salivary gland of the tick IXodes ricinus at
various stage of the host-parasite interaction
process
SIMDAT
IXodus – Workflow design (1)
•
From IXodus scientific protocol to IXodus
workflow (WF) design, we identify 2 uses cases:
1. “New cDNA sequences”: the workflow is
daily feeded with a batch of nucleic
sequences from the systematic sequencing
of thousands of salivary gland cDNAs
2. “Databank update”: whenever a new
version of relevant biological databank
appears, the core workflow analysis is reenacted to discover potentially new
information
SIMDAT
IXodus design (2)
Analysis by
domain expert
cDNASequence
Use Case 1
A
cDNASequence
<<datastore>>
IX-odus
DB Annotate
group membership
A
IX-odusSequences
Sequences
membership
Compare with
IX-odus DB
<<datastore>>
IX-odus
B
SYSTEM
Gathering
Scientist
Provide new
cDNA
IX-odusCompResult
A
part
B
[similar AND (exact part)]
[similar AND NOT (exact part)]
virtualSequence
Build new virtual
sequence
<<datastore>>
IX-odus
SuccessAnnotation
<<datastore>>
EMBL
A
Pre-processing
make
BlastN
<<datastore>>
UNIPROT
/GENPEPT
A
EMBLSequences
[else]
part
C
EMBLSequences
[similar]
BlastNResult
DB Annotate
"success"
UGSequences
[else]
make
BlastX
[similar]
BlastXResult
[else]
A
C
EMBLSequences
make
TBlastX
Main analysis
part
TBlastXResult
[similar]
[else]
PotentialNewAnnotation
A
[else]
<<analysis_kind>>
ORF finder
ORFindResult
[found]
DB Annotate
"potential new"
<<Analysis_kind>>
Motif search
<<datastore>>
IX-odus
[else]
MSResult
[found]
SIMDAT
MotifFoundAnnotation
InterproSequences
IX-odus UML 2.0
activity diagram
Use case: "New cDNA
sequences"
A
<<datastore>>
INTERPRO
DB Annotate
"motif found"
Use Case 2
<<datas tore>>
IX-odus
<<datas tore>>
IX-odus
Succes s Annotation
IX-odus Sequences
<<datas tore>>
EMBL
B
EMBLSequences
[s imilar]
make
Blas tN
<<datas tore>>
UNIPROT
/GENPEPT
A
EMBLSequences
A
Blas tNRes ult
DB Annotate
"s ucces s "
UGSequences
[els e]
make
Blas tX
Blas tXRes ult
[s imilar]
SYSTEM
IXodus design (3)
Scientist
Receive notfication
Analys is by
domain expert
[els e]
A
B
EMBLSequences
make TBlas tX
Send TBlas tX notification
TBlas tXRes ult
[s imilar]
[els e]
PotentialNewAnnotation
A
[els e]
<<analys is _kind>>
ORF finder
ORFindRes ult
[found]
DB Annotate
"potential new"
<<Analys is _kind>>
Motif s earch
<<datas tore>>
IX-odus
[els e]
MSRes ult
[found]
InterproSequences
A
DB Annotate
"motif found"
after one week
Update
Databank
part
SIMDAT
MotifFoundAnnotation
IX-odus UML 2.0
activity diagram
Use case: "Databank update "
SYSTEM Administrator
Event
processing
<<datas tore>>
INTERPRO
IXodus – Implementation
• Workflow technology platform:
InforSenseTM KDE
• Implementation is tightly coupled
with the deployment environment,
which is mainly driven by 2 kind of
constraints:
–GRID approach
–Semantic Web (SW) approach
SIMDAT
IXodus implementation - The test-bed
GRID approach
ULB

InforSense KDE
<<Plugin>>
Semantic
Broker
IPRSCAN
Bio Tools
<<Plugin>>
<<Plugin>>
BioSense
GRIA
GRIA
Client
E2E Sec
Client


Knowledge DB
IXodus

EMBL -services
NoDynA
E2E Sec
Server
Main properties
Federated data and services
with redundancy
Privacy, AuthZ, AuthN, non
repudiation
Intellectual Proprietary (IPR)
preservation by traceability
(digital signatures)
UsersInternet
profiles management
to optimise resources
availability
ULB -services
NoDynA
GRIA
MRS
Web Service
Wrappers
E2E Sec
Server
Semantic enabled
service publication
Semantic enabled
service discovery
OWL_DL
Reasoning
NEC – Semantic Broker
EMBL - services
NoDynA
GRIA
MRS
Web Service
Wrappe rs
E2E Sec
Server
GRIA
MRS
Web Service
Wrappe rs
EMBOSS
&
BLAST
EMBOSS
&
BLAST
EMBOSS
&
BLAST
Tools
Tools
Tools
SIMDAT
IXodus implementation - The test-bed
SW approach
InforSense KDE
<<Plugin>>
Semantic
Broker
IPRSCAN
Bio Tools
<<Plugin>>
<<Plugin>>
BioSense
GRIA
GRIA
Client
E2E Sec
Client
ULB
Main properties
 Semantic-enabled service
annotation
 Semantic-enabled service
discovery
Semantic enabled
service publication
Semantic enabled
service discovery
OWL_DL
Reasoning
“Which service instance
can operate on the latest version of
the EMBL databank?”
NEC
Semantic Broker
 Dynamic
update of already
Internet
annotated services
Service advertising
NoDynA
E2E Sec
Server
NoDynA
GRIA
MRS
Web Service
Wrappers
E2E Sec
Server
NoDynA
GRIA
MRS
Web Service
Wrappe rs
E2E Sec
Server
GRIA
MRS
Web Service
Wrappe rs
EMBOSS
&
BLAST
EMBOSS
&
BLAST
EMBOSS
&
BLAST
Tools
Tools
Tools
EMBL
SIMDAT
ULB
NEC
IXodus implementation – InforSense KDE
The complete Workflow
SIMDAT
IXodus implementation – InforSense KDE
User sequences gathering
SIMDAT
IXodus implementation – InforSense KDE
Management of sequences overlapping
SIMDAT
IXodus implementation –
InforSense KDE
Main analysis flow (Bioinformatics tools)
SIMDAT
IXodus implementation –
InforSense KDE
Service instance selection & launching
SIMDAT
IXodus - General benefits
• Workflow tool maturity: design of
complex WF to support demanding
problem in a reasonable deliverytime is a reality (RWD vs. RAD)
• WF on GRID approach is really
valuable and provides the confidence
we need to front the data/services
“tsunami” in Life sciences… the good
news is …
SIMDAT
IXodus - General benefits (2)
...thanks to WF
technologies, the
scientists no more scares
the vertiginous “beast”
(data/services
explosion)…
SIMDAT
IXodus – Remaining challenges
• B2A Grids: we still need precise understanding of strategic
benefits from both (“win-win”) side
• WF technologies: need better distinction between “abstract” WF
and “operational” WF:
– How to decouple?
– Runtime service selection using the concept of rules?
• At design phase: the designer would appreciate semantics
approach to search for services
• From WF to Service:
– Partial (∑args) vs. Complete(∑args)
– Different profiles of user
• From WF to UI:
– At design phase: need to define how WF actors interact with
the whole system
• To leverage the WF log in order to generate textual information
that would support scientific papers/notebooks writing (who,
service_name, service_version, database_version, …)
SIMDAT
SIMDAT- Major outcomes to expect
SIMDAT approach will provide state-of-the-art
components
• To enable industry-strength environment for eScience activities
• To support the academia/industry collaborations
in R&D activities (B2B & B2A Grids)
– B2A Grids: how the “win-win” model is
precisely configured?
• To help build up virtual organisations that
federate data, services and scientific expertise
SIMDAT
Thank you !
Web: http://www.simdat.org
Contact: [email protected]
Acknowledgments
 co-author: Robert Herzog, Université Libre de Bruxelles (ULB)
 Scientific expert: Valérie Ledent, ULB
 Edmond Godfroid & Bernard Couvreur: Laboratory of Applied
Genetics, ULB
 SIMDAT colleagues: Joseph Mavor (ULB), Falk Zimmermann
(NEC), Changtao Qu (NEC), Nabeel Azam (InforSense),
Moustapha Ghanem (InforSense), Kai Kumpf (SCAI-Bio)
SIMDAT