V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo Dell’Acqua ECMWF SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT.
Download ReportTranscript V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo Dell’Acqua ECMWF SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT.
V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo Dell’Acqua ECMWF
SIMDAT TMB, 15 December 2004 AMD-1
SIMDAT
Project History:
May 2002: Thirteenth session WMO Regional Association VI: “…agreed that the concept of a Virtual GISC had merit…”
June 2002: V-GISC in RA-VI Kick-off Meeting
Partners: DWD, Meteo France, UK Met-Office, EUMETSAT, ECMWF
Steering Group + 4 working groups: Policy, Data, Communications, Dissemination/Acquisition
2003: SIMDAT project proposal submitted to EU
1 September 2004: contract with EU is signed
October 2004: V-GISC steering group decides to move V-GISC development into the SIMDAT project
November 2004: SIMDAT Kick-off meeting
4 V-GISC working groups are mapped onto SIMDAT working groups: Virtual Organisation, Ontologies, GRID Infrastructure, Access to Distributed Data
February 2005: First (V-)GISC-demonstrator at CBS SIMDAT TMB, 15 December 2004 AMD-2
SIMDAT
SIMDAT - Introduction
Data Grids for Process and Product Development using Numerical Simulation and Knowledge Discovery
4 years project funded by the EU
Contract with EU was signed on 1 September 2004
SIMDAT focuses on 4 applications
Product design in automotive and aerospace
Process design in
pharmacology
Service provision in meteorology
Objective of SIMDAT is to use data grid technology to resolve a complex problem for each of the 4 applications
Budget of 11 M € of which 10.5% for meteorological activity
320 men/month taking into account EU funding and the contribution from the partners SIMDAT TMB, 15 December 2004 AMD-3
SIMDAT
SIMDAT - Strategy
7 Grid-technology areas have been identified to achieving SIMDAT objectives
Integrated Grid infrastructure offering basic services to applications
Access to data distributed on Grid sites
Management of Virtual Organisation
Ontologies
Integration of analysis services
Workflows
Knowledge Services
Phase 1 : Connectivity Phase 2 : Interoperability Phase 3 : Knowledge
.
Deployment of Grid infrastructure with particular attention to data transport and
.
management Distributed DB access
.
Virtual Data Repository
.
Introduction of grid technologies research Workflows for next generation aggregated knowledge capture, discovery and mining
SIMDAT TMB, 15 December 2004 AMD-4
SIMDAT
Meteorology application : Project Aims
5 partners: DWD, Meteo-France, UK Met Office, EUMETSAT and ECMWF
3 “potential” GISCs : DWD, Meteo-France, UK Met Office
2 DCPCs : ECMWF, EUMETSAT
Instead of each National Met Service having a GISC (Global Information System Centre)
The V-GISC will be seen as a normal GISC and will fulfil the WMO Information System technical requirements
The project will build the foundations of the V-GISC by developing an infrastructure that brings together the data of the partners and provides access to the distributed meteorological databases SIMDAT TMB, 15 December 2004 AMD-5
SIMDAT
Meteorology application : Project Aims
A complex problem: To build a Virtual GISC, an integrated and scalable framework for the collection and sharing of distributed data that will offer:
A single view of meteorological information which is distributed amongst the 5 partners
Improve visibility and access to meteorological data through a comprehensive discovery service based on metadata development
Offer a variety of reliable reliable delivery services (routine dissemination of and collection of data)
Provide a global access control policy managed by the partners and integrated into their existing security infrastructure
Quality of services, reliability and security
Processing services and shared data manipulation facilities
The software developed within the project will be made available to WMO SIMDAT TMB, 15 December 2004 AMD-6
SIMDAT
GRID Technology
Grid technology will be used
To connect the diverse data sources and create a Virtual Database
To enable flexible, secure collaboration through virtual organisation
Data Grid technology presents an architectural framework that aims to provide access to distributed data in a simple,secure, reliable and scalable manner from a widely distributed set of computers various administrative boundaries and across
The essential characteristics of a Data Grid are:
Reference a dataset by a unique identifier
Discover dataset by attributes
Track multiple copies of a single file, and ultimately locate the "nearest" copy
Move files from one point on the grid to another point (push, pull and third party copy)
The domain of the V-GISC is an ideal candidate to exploit such a framework SIMDAT TMB, 15 December 2004 AMD-7
SIMDAT
V-GISC infrastructure
Management User registration DB admin Catalogue admin Security Authentication Authorization Audit Monitoring Logging Control Error tracking Interface to offer a single view of the data - Discovery facilities - Request/Subscription Grid infrastructure for sharing data Interoperability interfaces for data/metadata exchange mechanisms to synchronise metadata SIMDAT TMB, 15 December 2004 Dissemination/acquisition mechanisms AMD-8
SIMDAT
Meteo requirements
SIMDAT TMB, 15 December 2004 AMD-9
SIMDAT
V-GISC Conceptual view
SIMDAT TMB, 15 December 2004
Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization
The Virtual Database Service provides a single partners databases view of AMD-10
SIMDAT
V-GISC Conceptual view
Virtual Database
Provide the unified view of all the shared datasets through a distributed catalogue
Maintain the distributed catalogue amongst the partners using synchronization mechanisms
Provide interfaces with the legacy databases
Implement data replication mechanisms
Preserve the integrity of the data Access Facilities
Collection & Dissemination services that support secure, efficient and reliable transport mechanisms
Quality of Service (QoS): Traffic Prioritization, Queuing mechanisms, Scheduling
Discovery service by browsing the catalogue or using a keyword search engine
Interactive and batch interfaces
VO
Security Services (CA, AuthN, AuthZ, Audit,…)
Users management
Data policy management
Monitoring and control SIMDAT TMB, 15 December 2004 AMD-11
SIMDAT
V-GISC Distributed Architecture
V-GISC node is installed on each partner site
All the nodes are interconnected through a dedicated secure communication channel; The Database Communication Layer (DCL)
All the nodes exchange messages through the DCL
The architecture is decentralized
No central point where all the nodes are declared
No single point of failure
The network of nodes is self-organized
The network dynamically accepts new nodes and is aware of node disconnections
The network organizes its topology and indicates to the entering new nodes their position within the network
No manual intervention on the nodes to accepts new peers SIMDAT TMB, 15 December 2004 AMD-12
SIMDAT
V-GISC Distributed Architecture
SIMDAT TMB, 15 December 2004 AMD-13
SIMDAT
V-GISC Node
Each node maintains a copy of the global catalogue describing data available through the V-GISC
The catalogue synchronization is done using the DCL
Each node maintains a cache used to replicate data and to efficiently serve the users
A node is interfaced with the local legacy databases
A node has a Web Portal for interactive access
A node has a Grid/Web Service Portal for batch access and integration of the V-GISC in a bigger Grid
A node implement all services offered by the V-GISC SIMDAT TMB, 15 December 2004 AMD-14
SIMDAT
V-GISC Node - Functional Design
SIMDAT TMB, 15 December 2004 AMD-15
SIMDAT
Demonstrator – Functional View
To deploy a flexible infrastructure on top of which the Virtual Information Centre can be built To use Grid technologies to federate databases located on partners site To show to the user a unique view of data sets stored by at least 3 partners To get a first implementation of the catalogue based on WMO core metadata To offer first VO security services SIMDAT TMB, 15 December 2004 AMD-16
SIMDAT
Demonstrator - Design
3 main components to build the virtual database: Data Repository, Catalogue Node and Portal
installed on each partner site and interconnected through a dedicated secure connection channel Data Repository
Interface to the partners databases Offers metadata information to describe, search, locate data Offers interface to retrieve data from the associated local databases Catalogue Node
Maintains the catalogue and ensures synchronisation Harvests metadata and requests data from the data Repository Ingests data and maintains the cache of the V-GISC Serves clients: Portal or other Nodes Monitors the execution of the requests Distributed Portal
Offers interface to search/browse the V-GISC catalogue SIMDAT TMB, 15 December 2004 AMD-17
SIMDAT
Demonstrator - Architectural Choices
Grid Architecture that can accept any kind of Grid Technology
Free to choose any grid middleware (OGSA-DAI, GRIA, Glite, GT4) and pick the best component of each middleware that meets the V-GISC requirement
Catalogue Node built on a J2EE component framework
Solid framework used in production environment
Includes different services such as persistency, monitoring, configuration, etc
The framework can be seen as a kernel of components where it is easy to add services such as Grid services or Web services
Catalogue duplicated and synchronized on each site
To have a fast discovery (browse & search phase) phase
To have a reliable system (client redirection to another node in case of problems) SIMDAT TMB, 15 December 2004 AMD-18
SIMDAT
Demonstrator - Architecture
SIMDAT TMB, 15 December 2004 AMD-19
SIMDAT
Demonstrator - Deployment
SIMDAT TMB, 15 December 2004 AMD-20
SIMDAT
Problems and lessons learned - 1
Grid Middleware
Technology not mature for production environment
Middleware still evolving toward standards (WSRF, WSI, …)
Access to distributed data
No efficient and robust transport mechanism
No mechanism to duplicate and synchronize data
Difficult to ensure data integrity on huge data volumes
OGSA-DAI is promising, easy to understand and use SIMDAT TMB, 15 December 2004 AMD-21
SIMDAT
Problems and lessons learned - 2
Ontology / Metadata
Meteorological metadata are described using XML WMO-CORE metadata Profile
• •
Metadata description larger than the data Same information repeated in all metadata records
information is circulating over the network Unnecessary
•
Large metadata records slowing down the Database hosting the catalogue
Universal request language was not a solution to the virtual database problem
VO
No standard tools to manage users and data policies
No standard security policies SIMDAT TMB, 15 December 2004 AMD-22
SIMDAT
What’s next
Finalise the Connectivity phase (by M18/Mar 2006)
Connect EUMETSAT to the Grid (M12-M15/Sep-Dec 2005)
Enhance the architecture (M13-M18/Oct 2005-Mar 2006)
Implement Registration Authority (M16-M17/Jan-Feb 2006)
Improve metadata model (M13-M16/Oct 2005-Jan 2006)
Enhance distributed portal (M14-M16/Nov 2005-Jan 2006)
Introduce acquisition of data (M18-M24/Mar-Sep 2006)
Develop subscription service (M20-M28/May 2006-Jan 2007)
Start developing the Virtual Organisation
Monitoring and management of the system (M18-M24/Mar-Sep 2006)
User management and data access control (M24-M30/Sep 2006-Mar 2007)
Develop the discovery mechanism (M20-M25/May-Oct 2006)
Start testing with other potential GISC
Japan and Australia have expressed interest in joining the SIMDAT project SIMDAT TMB, 15 December 2004 AMD-23
SIMDAT
Global View : Coordination Effort
Metadata
Request-reply mechanism
Exchange of catalogues
Definition on what data should be available and to whom
Virtual Organisation
Standardisation of services
Quality of Service
Security SIMDAT TMB, 15 December 2004 AMD-24