World Data Center Climate: Status and Portal Integration GO-ESSP at LLNL

Download Report

Transcript World Data Center Climate: Status and Portal Integration GO-ESSP at LLNL

World Data Center Climate:
Status and Portal Integration
Michael Lautenschlager,
Hannes Thiemann and Frank Toussaint
ICSU World Data Center Climate
Model and Data / Max-Planck-Institute for Meteorology
Hamburg, Germany
GO-ESSP at LLNL
Livermore, June 19th – 21st, 2006
WDCC Home: www.wdcc-climate.de / WDCC Contact: [email protected]
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 1
Content:
WDCC Status
CERA Concept
Portal Integration
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 2
WDCC Content
June 2006: 590 Experiments / 79.000 Data Sets
Data from
Earth System
Modelling and
Related
Observations
WOCE
ERA40
BALTEX
CARIBIC
GEBCO
HOAPS
IPCC
NCEP
EH5/MPI-OM
IPCC-AR4
CEOP
COSMOS
ERA15/40
Simulations @ MPI, GKSS,…
Start: Approved in January 2003
Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing
Centre (DKRZ)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 3
Data Export from WDC Climate
Corresponds to 2 – 10 TB/month
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 4
Geographical Distribution of WDCC Users
Total number of registered users: 750 (Mai 2006)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 5
Data Import into WDC Climate
ECHAM5/MPI-OM IPCC AR4 Scenarios (ca. 110 TB)
6 * 10**9 BLOBs
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 6
CERA1) Concept:
Semantic Data Management
(I) Data catalogue and Pointer to Unix files


Enable search and identification of data
Allow for data access as they are (coarse granularity raw
data files)
(II) Application-oriented data storage in BLOB tables
 Time series of individual variables are stored as BLOB
entries in DB Tables (fine granularity data products)
Allow for fast and selective data access
 Storage in standard data format (GRIB, NetCDF/CF)
Allow for application of standard data processing routines
(PINGOs, CDOs)
1) Climate and Environmental data Retrieval and Archiving
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 7
WDCC Data Topology
Level 1 - Interface:
Metadata entries
(XML, ASCII)
+ Data Files
Level 2 – Interf.:
Separate files
containing BLOB
table data in
application
adapted structure
(time series of
single variables)
Experiment
Description
Pointer to
Unix-Files
Dataset 1
Description
Dataset n
Description
BLOB Data
Table
BLOB Data
Table
BLOB DB Table corresponds to scalable,
virtual file at the operating system level.
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 8
CERA Data Model
Reference
Status
Distribution
Contact
Coverage
Entry
Parameter
Data OrgLocal Adm.
Data Access
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 9
Spatial
Reference
Data matrix of model experiment
T2M
Precip
SLP
2D
variables .
.
Temp
Water
vapour
3D
variables .
T1
T2
T3
..
..
..
.
Tn
..
..
..
..
..
..
..
..
..
.
.
Raw data file in
DKRZ Archive
Model Run Time
Model variables
Tend
2 D: small BLOBS (180 KB)
Each columm is one
BLOB Table in CERA-DB
3 D: large BLOBS (3 MB)
Raw data file: direct model output (1.3 – 16.2 GB)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 10
Climate Model Data Structures
Preferred DB-storage structure for web-based access:
• single variable
• single level
• time series of 2D gridded data records
• Formats: GRIB-1 – NetCDF/CF (- GRIB-2)
original data structure (4-D)
Application related data structure (2-D)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 11
TX7: Intel Itanium-2 with Linux
DKRZ Architecture
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 12
Portal Integration
Two strategies:
One way integration: discovery and use metadata are
integrated in a central data portal in one step
Example:
C3Grid data catalogue (refer to presentation from Heinrich
Widmann)
Two way integration: discovery metadata are integrated in
central data portal, use metadata are extracted from remote
archive when they are needed for data download and
processing
Example:
Primary data publication in TIB library catalogue (STD-DOI)
WDCC integration in NDG (NERC Data Grid)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 13
Primary data publication (STD-DOI)
URL: http://www.std-doi.de/
Primary Data
Publication
Process
Data
Review
ISO 690-2:
Metadata for
citation of
electronic media
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 14
Example: Publ.-DOI from WDCC
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 15
DOI
URN
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 16
Publ.-DOI
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 17
830 GB
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 18
Ident.-DOI
Data retrieval procudure is
given at the end (user
identification is required)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 19
WDCC Metadaten und OAI-PMH
Open Archives Initiative
Protocol for Metadata Harvesting
WDCC
OAI server at:
Ü
(Software: dlese (www.dlese.org) + apache-tomcat 5.5.12 + Java 1.5)
http://uranus.dkrz.de:8080/oai/provider
-
35 IPCC experiments with more than 11000 datasets
Metadata Format: ISO 19115
C3Grid
(http://gsphere.awi.de:8080/gridsphere/gridsphere)
-
40 STD-DOI experiments with more than 1700 datasets
Metadata Format: DIF
GO-ESSP (NDG, http://ndg.badc.rl.ac.uk/)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 23
NDG
OAI Harvesting (Pull or Notification)
Ü
DIF XMLs
WDCC
OAI Server
WDCC
(Software: dlese)
OAI Client
NDG
(dlese)
Catalog
NDG
record 1...n
Discovery Portal
NDG
DIF XMLs
Provider 2
OAI Server 2
Process
OAI Server n
Delivery
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 24
URL: http://glue.badc.rl.ac.uk/discovery/
Keyword: ECHAM4
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 25