WIS V-GISC – Simdat WMO REGIONAL SEMINAR OUAGADOUGOU BURKINA FASO 12-13 FebruaryJacques Roumilhac.

Download Report

Transcript WIS V-GISC – Simdat WMO REGIONAL SEMINAR OUAGADOUGOU BURKINA FASO 12-13 FebruaryJacques Roumilhac.

WIS
V-GISC – Simdat
WMO REGIONAL SEMINAR
OUAGADOUGOU BURKINA FASO 12-13 February
2007
Jacques Roumilhac
WIS








Will of WMO to renew the Information System
FWIS: Future WMO Information System
Now we speak about the WIS (the ‘F’ disappears)
Based on Core Metadata on XML Format to define all the data
GTS renewal included in the philosophy
Nodes: GISC, DCPC and NC
GISC: Global Information System Centre
VGISC: Virtual Global Information System Centre (MetOffice, Météo
France, DWD, Eumetsat, ECMWF)
SIMDAT
WIS Functional Requirements
 Support variety of data types (Common to all WMO Programmes)
 Support Archive and Real-time datasets
 Provide a Catalogue of all the meteorological data for exchange to
support WMO programmes
 Support ad-hoc requests for data and products: Pull model
 Support routine dissemination of all observed data and products
both real-time and non real-time : Push model
 Support network security
 Support of different users profile and data policies
 Use different types of communication links (GTS, satellite, dedicated
links)
SIMDAT
SIMDAT
 European Project on Grid Technology
 Decided in 2004 to do a demonstrator V-GISC on Simdat backbone
 SIMDAT focuses on 4 application areas:
– product design in automotive and aerospace,
– process design in pharmacology
– service provision in meteorology
Phase 1: Connectivity
SIMDAT
Phase 2: Interoperability
Phase 3: Knowledge
. Deployment of Infrastructure
. Virtual Data Repository
. Integration of analysis
with particular attention to data
transport and management
. Distributed Data access
. Introduction of Grid
technologies research
. Introduction of VO
services, workflows,
discovery and data
mining
Meteorology Application : Project Aims
 Service oriented framework targeting meteorology, hydrology,
climate and environment and offering transparent access to
distributed resources
– Discovery service, cataloguing service, subscription service, …
 Some key elements of the project are:
– A single view of meteorological information which is distributed amongst
the meteorological partners
– Improve visibility and access to meteorological data through a
comprehensive discovery service
– Offer a variety of reliable services for collection and sharing of data and
for routine dissemination (future)
– Provide a global access control policy managed by the partners and
integrated into their existing security infrastructure (future)
SIMDAT
Architecture

3 main components to build the virtual database: Data Repository,
Catalogue Node and Portal
– Installed on each partner site and interconnected through a dedicated secure
connection channel

Data Repository
– Interface to the partners databases
– Offers metadata information to describe, search, locate data
– Offers interface to retrieve data from the associated local databases

Catalogue Node
–
–
–
–
–

Maintains the registry and ensures synchronisation
Harvests metadata and requests data from the data Repository
Ingests data and maintains the cache of the real-time data
Serves clients: Portal or other Nodes
Monitors the execution of the requests
Distributed Portal
– Offers interface to search/browse the catalogue
SIMDAT
Architecture (cont.)
SIMDAT
Support variety of data types

Interface to the existing Meteorological Databases
– It provides access to any kind of databases (rdbms,
bespoke, flat files)

Metadata provider
– Provide Metadata information to discover, locate and
describe data, in respect with a defined XML metadata
format
– Answer Catalogue Node metadata harvesting
messages

Data provider
– Provide an interface to asynchronously request data
from the associated existing database
– Transform the XML data request to the real database
request
– Offer a data channel (HTTP, FTP, …) to send the
retrieved data to the Catalogue Node
SIMDAT
Support variety of data types
Community Portal
Catalogue
Satellite data
Model output
Climate Time Series
Oceanographic
data (BATHY, SHIP)
ERA40 data
TIGGE data
More than 27,000 datasets
discoverable
Climate Time Series
Model output
Real-time GTS data
Model output
Satellite data
Model output
Wave Observation
SIMDAT
Model output
Observation
Aviation data (TAF, METAR)
Lightning data
Catalogue of all available products
 The Catalogue is built using the metadata
harvested from the Data Repositories
 The Catalogue is synchronized and
replicated on each Catalogue Node
 The Catalogue Node offers discovery
services accessible to the user through the
distributed portal
 The Catalogue contains the necessary
information to retrieve and sub select the
data
SIMDAT
WMO Core metadata standard - Challenges
 WMO Core Profile, profile of ISO19115 on geo-referenced data
 Scalability
– Records are large and contain redundant information, slowing down the
database hosting the catalogue
– Same information repeated in all metadata records  Unnecessary
information is circulating over the network
– Some documents are orders of magnitude larger than data itself
– Cannot represent very large archives with small granularity
 Cannot fulfil all requirements to build the V-GISC
– Information on how to retrieve data from local databases
– Information to create a directory (Taxonomy of documents)
– Information to sub-select data from a dataset
SIMDAT
WMO Core metadata standard - Solutions
 Split XML documents into fragments to solve
the scalability issue
– WMO core metadata is structured
– Some parts are shared amongst many
documents
WMO
UKMO
Synop
Heathrow
2005-10-12
Core
Owner
Data type
Location
Date
 Add specific extension to define all relevant information needed to
implement the system and not defined by the WMO core
–
–
–
–
Internal unique ID
Hierarchy relationship
Physical location (which node holds the data)
Information used to generate a valid request to retrieve data from the
end system
– Information used to create web interface for the end user
 Work with WMO ET to integrate extensions in future releases of
standards
SIMDAT
Metadata Synchronization
 New observation has been received by one site
SIMDAT
Metadata Synchronization (cont.)

The associated metadata are generated and published in the Data Repository
SIMDAT
Metadata Synchronization (cont.)

Catalogue Node harvests the new metadata and stores it in its Catalogue
SIMDAT
Metadata Synchronization (cont.)

The Catalogue of the other Nodes is synchronized and the dataset is
searchable from any sites
SIMDAT
Support Archive and Real-time Data
 A GTS Data Repository has been
developed
– Interfaced with the GTS (through a MSS)
– It publishes GTS collections in the Cache
– Currently,no data replication over the
SIMDAT infrastructure
 For phase III several sources plugged onto
SIMDAT
– Strategy to uniquely identify the datasets
(using MD5 hash codes)
– Real-time data replication using the
metadata synchronization mechanism
– Generic Solution that can be used by all the
partners
SIMDAT
Support Pull model


A Portal is deployed on each site and offers a
unique view of all the datasets available
Portal offers discovery mechanisms to the users
– Full text, temporal and geographical search (googlelike)
– Directory browsing (yahoo-like browsing)

Portal provides request handling mechanisms to
the users
– Submitted requests can be asynchronous to
manage long-lived requests
– Users can manage requests (check status, delete
them …)
– Users retrieve the associated data when the
request is complete
SIMDAT
Support of different profile and data policies

VO
Domain
VO Domain
– Group of organisations that share a common data
access policy (e.g. the RA-VI V-GISC)
A
B
C
D
– Access to protected resources occurs on a domain basis

Authentication (AuthN)
F
D
E
1
2
– Users register with a node
– Users are known to all the nodes in the same domain
– Any node within the domain should be able to authenticate a user of the domain

Authorisation (AuthZ)
– AuthZ is performed at the node level to allow/deny access to the data
– Data Access policy is expressed within the metadata

Implementation : first release March 2007
SIMDAT