Welcome to the OMII

Download Report

Transcript Welcome to the OMII

Distributed Data Linking using
OGSA-DAI and OGSA-DQP
Ally Hume
4th International Conference on e-Social Science
18 June 2008
Web: www.omii.ac.uk
Email: [email protected]
Motivation
• Grid is about sharing resources.
• OGSA-DAI is concerned with sharing structured data.
Relational
Database
XML
Database
Indexed
File
Web: www.omii.ac.uk
Email: [email protected]
Sharing data via website download
• Zip up the dataset and put it on a website.
• Pros:
o
Easy for data provider.
• Cons:
o
o
o
Possible very large download of only small portion
required.
User has to install data into a local database to use it.
Static snapshot.
Web: www.omii.ac.uk
Email: [email protected]
Sharing data via direct access
• Tell client the URL of the database and a
username and password, e.g:
o
mycomputer.epcc.ed.ac.uk:3306, userID, password
• Pros:
o
Fast.
• Cons:
o
o
o
o
o
Firewall issues.
User and password management is hard.
No consistent security model.
Hard to use in Grid/Web service workflows.
Continued on next slide…
Web: www.omii.ac.uk
Email: [email protected]
Sharing data via direct access
• Cons (continued)
o
o
o
o
No server side layer in which to standardize database
heterogeneities.
Client needs to know, and have installed, correct
driver for the database.
Different drivers for Java, C#, C++, Fortran etc.
Totally different API for different database types, e.g.
JDBC for Relational, XMLDB for XML, Lucene for
indexed files.
Web: www.omii.ac.uk
Email: [email protected]
Domain-specific web service
• Web service that gives access to the data using
domain-specific operations, e.g.
o
o
o
Book findByISBN(ISBN)
List<Book> findByAuthor(Author)
List<Book> findByKeyword(Word)
• Pros:
o
o
o
o
Fits with grid/web service approach.
Abstraction hides back-end database details.
Web services are programming language neutral.
Operations likely to map well to authorization policies.
Web: www.omii.ac.uk
Email: [email protected]
Domain-specific web service cont.
• Cons:
o
Slower than direct access
• SOAP overhead.
• SOAP very poor encoding for large result sets.
o
Domain-specific API prevents use of generic data exploration,
mining and manipulation tools.
Books
Cancer
Generic Data Linking
Application
Books written
by University
employees
Web: www.omii.ac.uk
University
Employees
University
employees in
1932 who have
since died of
cancer
Email: [email protected]
OGSA-DAI generic web service
• OGSA-DAI gives access to data via a generic web
service.
• Clients sees the data in its ‘raw’ format, e.g.
o
o
Tables, columns, rows for relational data.
Collections, elements etc. for XML data
• Clients can obtain the schema of the data.
• Clients send queries in appropriate query language, e.g.
SQL, XPath.
Relational
Database
request
OGSA-DAI
XML
Database
data
Indexed
File
Web: www.omii.ac.uk
Email: [email protected]
OGSA-DAI
• Pros:
o
o
o
Fits with grid/web service approach.
Web services are programming language neutral.
Access to schema and raw data supports generic tools.
Web: www.omii.ac.uk
Email: [email protected]
Reducing use of SOAP
• But SOAP is still slow for data delivery.
• OGSA-DAI supports workflows to reduce data
movement and the use of SOAP.
OGSA-DAI
Transform
Web Service
FTP Server
Web: www.omii.ac.uk
OGSA-DAI
Query ->
Transform ->
DeliverToFTP
FTP Server
Email: [email protected]
OGSA-DAI workflows
•
•
•
•
Workflows composed of pipelined activities.
Activities are installed at the server.
Data streams between activities.
Activities for data querying, data transforms,
data integration and data delivery.
• Allows some computation to be moved closer
to the data.
Web: www.omii.ac.uk
Email: [email protected]
OGSA-DAI Workflow Example
Access
SQLQuery
ObtainFromHTTP
SELECT * FROM Bands
WHERE name = Bangles;
http://www.someplace.org/styl
esheets/webRowSetToHTML.xsl
tuples
Transform
TupleToWebRowSetCharArrays
XSL
WebRowSet XML
XSLTransform
HTML
Deliver
DeliverToURL
ftp://www.musicplace.org/bands/Bangles.html
Web: www.omii.ac.uk
Email: [email protected]
Data linking scenario 1: VOTES
• Virtual Organisations for Trials and
Epidemiological Studies (VOTES).
o
•
•
•
•
http://labserv.nesc.gla.ac.uk/projects/votes/
Linking records related to medial trials.
Linking key is Scotland’s CHI number.
Data stored in relational databases.
Perform joins over distributed data.
Web: www.omii.ac.uk
Email: [email protected]
Data linking scenario 1 : VOTES
CHI
Name
Sex
DOB
13101955448
John Smith
M
14/8/1965
23045635248
Mary Jones
F
24/11/2000
CHI
Diagnosis
13101955448
Lung cancer
23045635248
Leukaemia
CHI
Sex
DOB
Diagnosis
13101955448
M
14/8/1965
Lung cancer
23045635248
F
24/11/2000
Leukaemia
Web: www.omii.ac.uk
Email: [email protected]
Data linking scenario 1 : VOTES
workflow
DB1
OGSA
DAI
DB2
SELECT CHI, Sex, DOB
FROM Patients
ORDER BY CHI
SQLQuery
(DB1)
(CHI, Sex, DOB)
(CHI, Sex, DOB, Diagnosis)
Ordered data
streams
SQLQuery
(DB2)
Merge
Join
(CHI, Diagnosis)
SELECT CHI, Diagnosis
FROM TrialX
ORDER BY CHI
Web: www.omii.ac.uk
Email: [email protected]
Deliver
Data linking scenario 1 : VOTES
workflow
DB1
OGSA
DAI
DB2
SELECT CHI, Sex, DOB
FROM Patients
WHERE CHI in (?)
ORDER BY CHI
Ordered data
stream
SQLQuery
(DB2)
SQL Nested
In Clause
Join
(DB1)
(CHI, Sex, DOB, Diagnosis)
(CHI, Diagnosis)
SELECT CHI, Diagnosis
FROM TrialX
ORDER BY CHI
Web: www.omii.ac.uk
Email: [email protected]
Deliver
Data linking across multiple servers
• Data linking across OGSA-DAI services
Workflow 1
OGSA
DAI
DB1
Data
Workflow 2
SQLQuery
(DB1)
Deliver to
OGSA-DAI
OGSA
DAI
DB2
Receive from
OGSA-DAI
JOIN
SQLQuery
(DB2)
Web: www.omii.ac.uk
Email: [email protected]
Deliver
Data linking scenario 2 - CDC
•
•
•
•
Centers for Disease Control and Prevention
Real-time Outbreak and Disease Surveillance
Unions over distributed data.
Data in Health Level 7 format:
Date
17/5/2008
17/5/2008
Reason
Chest pain
Flu
Age
43
64
Sex
M
F
Zip
15234
15112
• Task: For each postcode count the number of
occurrences of reason ‘X’.
Web: www.omii.ac.uk
Email: [email protected]
CDC RODS scenario
Flu occurrence 17/5/2007
Zip
Count
15112
2
15144
1
DB2
DB1
Date
17/5/2008
17/5/2008
Reason
Chest pain
Flu
Age
43
64
Sex
M
F
Web: www.omii.ac.uk
Zip
15234
15112
Date
17/5/2008
17/5/2008
Reason
Flu
Flu
Age
51
23
Email: [email protected]
Sex
M
F
Zip
15144
15112
CDC RODS Scenario: Merging using project
workflow
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
SQLQuery
(DB1)
SQLQuery
(DB2)
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
Web: www.omii.ac.uk
(15112, 1)
Ordered data
streams
(15112, 1)
(15144, 1)
DB1
OGSA
DAI
DB2
Project
(15112, 2)
(15144, 1)
(group by zip
and
sum(total))
Email: [email protected]
New database in Open MRS format
DB3
encounter
id
date
500
17/6/2008
patient_id
10000
gender birth
M
24/11/2005
person_address
Person_id zip
2000
15112
concept
id
name
Flu
23
FLU
observation
encounter_id concept_id
500
23
person
id
2000
patient
id
person_id
10000 2000
Define view
events
DB4
date
reason
17/6/2008 Flu
Web: www.omii.ac.uk
age
sex
zip
2
M
15112
Email: [email protected]
CDC scenario: using a view
DB1
workflow
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
SQLQuery
(DB1)
SQLQuery
(DB2)
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
OGSA
DAI
DB2
DB4
(15112, 1)
Ordered data
streams
(15112, 1)
(15144, 1)
SQLQuery
(DB4)
Web: www.omii.ac.uk
Project
(15112, 2)
(15144, 1)
(group by zip
and
sum(total))
View
Project
(group by zip
and
sum(total))
(15112, 1)
Email: [email protected]
DB3
(15112, 3)
(15144, 1)
OGSA-DQP
• Distributed Query Processing
• Allows tables in multiple databases to appear as tables
in one database. Can do joins and unions over the
tables.
OGSA-DAI
Virtual
Green DB
OGSA-DQP
OGSA-DAI
Blue DB
OGSA-DAI
Yellow DB
Web: www.omii.ac.uk
Email: [email protected]
CDC Scenario: using OGSA-DQP
DB1
workflow
OGSA
DAI
DB5
DB2
OGSADQP
DB4
SELECT zip, count(*) as total
FROM DB1.Cases UNION DB2.Cases UNION DB4.Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
(15112, 3)
(15144, 1)
SQLQuery
(DB5)
Web: www.omii.ac.uk
Email: [email protected]
View
DB3
CDC Scenario: View over OGSA-DQP
workflow
DB1
OGSA
DAI
DB6
View
DB5
DB2
OGSADQP
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”
GROUP BY zip
ORDER BY zip
SQLQuery
(DB6)
Web: www.omii.ac.uk
(15112, 3)
(15144, 1)
DB4
View
DB3
Cases:
SELECT * FROM
DB1.Cases UNION DB2.Cases UNION
DB4.Cases
Email: [email protected]
OGSA-DAI views for schema transforms
Date
Site
11/10/1993
11/10/1993
31/10/1993
N654 PIRU
N654 BEPA
N654 PIRU
Date
Site
11/10/1993 N654
13/10/1993 1
13/10/1993 1
Web: www.omii.ac.uk
Species Area
Count
2
1
2
26
3
29
Date
Site
Picrub
Betpap
13/10/1993
14/11/1993
1
1
13.5
8.4
1.6
1.8
Species
Density
Picea rubens
Picea rubens
Betula papyifera
13
13.5
1.6
Email: [email protected]
OGSA-DAI views for schema transforms
• View to map from TS2 to TS1:
o
SELECT Date, Site, "Picea rubens", Picrub FROM TS2
UNION
SELECT Date, Site, "Betula papyifera", Betpap FROM TS2
• View to map from TS3 to TS2:
o
SELECT s1.Date, s1.Site, s1.Density, s2.Density
FROM
(SELECT * FROM TS3 WHERE Species="Picea rubens") AS s1
JOIN
(SELECT * FROM TS3 WHERE Species="Betula papyifera") AS
s2
ON s1.Date = s2.Date
Web: www.omii.ac.uk
Email: [email protected]
OGSA-DAI views for security
• Views can be used for security
• A resource can use different views for different
types of user, e.g. nurses and doctors could get
different views on the same data.
• The rows a user can see can be restricted, e.g.
o
SELECT patients.* FROM patients, doctor WHERE
patients.doc_id = doctor.id AND doctor.dn = $DN$
• The views extension to OGSA-DAI will be
released soon.
Web: www.omii.ac.uk
Email: [email protected]
Data linking example 3: SEE-GEO
• SEcurE access to GEOspatial services
o
o
o
http://edina.ac.uk/projects/seesaw/seegeo/index.html
EDINA, MIMAS, NeSC, NCeSS
UK JISC project
• Geographical information systems
• Virtual integration of and access control to
o
o
o
Census data
Borders data
Data hosted by other organisations and exposed as services
• OGSA-DAI for federation of heterogeneous data
resources
Web: www.omii.ac.uk
Email: [email protected]
SEE-GEO – geo-linking service portal
1: GLSQuery
submited via
portal e.g. “Leeds
population
distribution by
census output
area”
GLS
Portal
Maps
5: Portal gets image using URL
4: URL of image is returned to portal – avoids
costly SOAP/HTTP transfer of image
MIMAS
Census
OGSA-DAI
Get
Join
Transform
Deliver
Get
UK
BORDERS
2: Workflow is populated with
query parameters and run
Web: www.omii.ac.uk
Image
Creation
Service
Email: [email protected]
3: Image
is placed
on a map
server
SEE-GEO – client’s view
GeoLinking Service Portal
Leeds population
distribution by census
output area?
Client
Web: www.omii.ac.uk
Email: [email protected]
Security
• OGSA-DAI on Globus Toolkit:
o
o
o
o
Message and transport level authentication, integrity and
encryption.
Message level proxy delegation.
Transport level delegation on TODO list
Authorization based to GT authorization framework
• Plug-in PIPs and PDPs
o
Work done to integrate VOMS with OGSA-DAI as part of OMIIEurope project.
• OGSA-DAI on OMII-UK:
o
o
Transport level security
Currently no support for resource-based authorization.
Web: www.omii.ac.uk
Email: [email protected]
Current status of OGSA-DAI/DQP
• Views functionality released soon
• Continued work on DQP
o
o
Group By not yet supported
Derived tables not yet supported, so views on top of
DQP does not yet work
• Limited support XML-based data linking
o
o
Work done in Japan
Plan to look into this with Arts and Humanities project
• We always endeavour to work closely with
users to fill in any gaps.
Web: www.omii.ac.uk
Email: [email protected]
That’s all folks!
•
•
•
•
OMII-UK booth
ENGAGE
OGSA-DAI website: www.ogsadai.org.uk
Any questions?
Web: www.omii.ac.uk
Email: [email protected]