World Data Center Climate: Status and Portal Integration GO-ESSP at LLNL
Download
Report
Transcript World Data Center Climate: Status and Portal Integration GO-ESSP at LLNL
World Data Center Climate:
Status and Portal Integration
Michael Lautenschlager,
Hannes Thiemann and Frank Toussaint
ICSU World Data Center Climate
Model and Data / Max-Planck-Institute for Meteorology
Hamburg, Germany
GO-ESSP at LLNL
Livermore, June 19th – 21st, 2006
WDCC Home: www.wdcc-climate.de / WDCC Contact: [email protected]
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 1
Content:
WDCC Status
CERA Concept
Portal Integration
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 2
WDCC Content
June 2006: 590 Experiments / 79.000 Data Sets
Data from
Earth System
Modelling and
Related
Observations
WOCE
ERA40
BALTEX
CARIBIC
GEBCO
HOAPS
IPCC
NCEP
EH5/MPI-OM
IPCC-AR4
CEOP
COSMOS
ERA15/40
Simulations @ MPI, GKSS,…
Start: Approved in January 2003
Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing
Centre (DKRZ)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 3
Data Export from WDC Climate
Corresponds to 2 – 10 TB/month
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 4
Geographical Distribution of WDCC Users
Total number of registered users: 750 (Mai 2006)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 5
Data Import into WDC Climate
ECHAM5/MPI-OM IPCC AR4 Scenarios (ca. 110 TB)
6 * 10**9 BLOBs
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 6
CERA1) Concept:
Semantic Data Management
(I) Data catalogue and Pointer to Unix files
Enable search and identification of data
Allow for data access as they are (coarse granularity raw
data files)
(II) Application-oriented data storage in BLOB tables
Time series of individual variables are stored as BLOB
entries in DB Tables (fine granularity data products)
Allow for fast and selective data access
Storage in standard data format (GRIB, NetCDF/CF)
Allow for application of standard data processing routines
(PINGOs, CDOs)
1) Climate and Environmental data Retrieval and Archiving
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 7
WDCC Data Topology
Level 1 - Interface:
Metadata entries
(XML, ASCII)
+ Data Files
Level 2 – Interf.:
Separate files
containing BLOB
table data in
application
adapted structure
(time series of
single variables)
Experiment
Description
Pointer to
Unix-Files
Dataset 1
Description
Dataset n
Description
BLOB Data
Table
BLOB Data
Table
BLOB DB Table corresponds to scalable,
virtual file at the operating system level.
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 8
CERA Data Model
Reference
Status
Distribution
Contact
Coverage
Entry
Parameter
Data OrgLocal Adm.
Data Access
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 9
Spatial
Reference
Data matrix of model experiment
T2M
Precip
SLP
2D
variables .
.
Temp
Water
vapour
3D
variables .
T1
T2
T3
..
..
..
.
Tn
..
..
..
..
..
..
..
..
..
.
.
Raw data file in
DKRZ Archive
Model Run Time
Model variables
Tend
2 D: small BLOBS (180 KB)
Each columm is one
BLOB Table in CERA-DB
3 D: large BLOBS (3 MB)
Raw data file: direct model output (1.3 – 16.2 GB)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 10
Climate Model Data Structures
Preferred DB-storage structure for web-based access:
• single variable
• single level
• time series of 2D gridded data records
• Formats: GRIB-1 – NetCDF/CF (- GRIB-2)
original data structure (4-D)
Application related data structure (2-D)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 11
TX7: Intel Itanium-2 with Linux
DKRZ Architecture
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 12
Portal Integration
Two strategies:
One way integration: discovery and use metadata are
integrated in a central data portal in one step
Example:
C3Grid data catalogue (refer to presentation from Heinrich
Widmann)
Two way integration: discovery metadata are integrated in
central data portal, use metadata are extracted from remote
archive when they are needed for data download and
processing
Example:
Primary data publication in TIB library catalogue (STD-DOI)
WDCC integration in NDG (NERC Data Grid)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 13
Primary data publication (STD-DOI)
URL: http://www.std-doi.de/
Primary Data
Publication
Process
Data
Review
ISO 690-2:
Metadata for
citation of
electronic media
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 14
Example: Publ.-DOI from WDCC
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 15
DOI
URN
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 16
Publ.-DOI
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 17
830 GB
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 18
Ident.-DOI
Data retrieval procudure is
given at the end (user
identification is required)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 19
WDCC Metadaten und OAI-PMH
Open Archives Initiative
Protocol for Metadata Harvesting
WDCC
OAI server at:
Ü
(Software: dlese (www.dlese.org) + apache-tomcat 5.5.12 + Java 1.5)
http://uranus.dkrz.de:8080/oai/provider
-
35 IPCC experiments with more than 11000 datasets
Metadata Format: ISO 19115
C3Grid
(http://gsphere.awi.de:8080/gridsphere/gridsphere)
-
40 STD-DOI experiments with more than 1700 datasets
Metadata Format: DIF
GO-ESSP (NDG, http://ndg.badc.rl.ac.uk/)
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 23
NDG
OAI Harvesting (Pull or Notification)
Ü
DIF XMLs
WDCC
OAI Server
WDCC
(Software: dlese)
OAI Client
NDG
(dlese)
Catalog
NDG
record 1...n
Discovery Portal
NDG
DIF XMLs
Provider 2
OAI Server 2
Process
OAI Server n
Delivery
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 24
URL: http://glue.badc.rl.ac.uk/discovery/
Keyword: ECHAM4
M.Lautenschlager (WDCC / MPI-M) / 15.06.06 / 25