What is OPeNDAP

Download Report

Transcript What is OPeNDAP

OPeNDAP and THREDDS:
Access and Discovery of Distributed Scientific Data
Yuan Ho
Ethan Davis
UCAR Unidata
Unidata Seminar Series - 30 January 2004
Access and Discovery of
Distributed Scientific Data
• OPeNDAP – access to scientific data but
no standard inventory or discovery
mechanisms
• THREDDS – cataloging, describing, and
discovery of scientific data
Unidata Seminar Series - 30 January 2004
What is OPeNDAP
•
OPeNDAP (Open source Project for a Network Data Access Protocol) is a
protocol for accessing distributed scientific data (aka DODS DAP).
•
OPeNDAP is a generic data exchange mechanism that lies at the core of a
variety of discipline data system.
•
OPeNDAP is two reference implementations of the protocol (C++ and Java)
•
OPeNDAP is a software framework that simplifies all aspects of scientific
data networking, allowing simple access to remote data.
•
OPeNDAP is a community of users and developers
•
OPeNDAP is a non-profit corporation called OPeNDAP Inc..
Unidata Seminar Series - 30 January 2004
Design Principles
• The user should be able to share their
data via OPeNDAP over network (server).
• The user should be able to use their
application package to examine or analyze
the data of interest (client).
Unidata Seminar Series - 30 January 2004
Client/Server Interaction
•
Data access (client)
–
Access to remote data in users normal
application
•
•
•
•
•
•
–
–
IDL (win32)
Matlab
Ferret
GrADS
Any netCDF application
Excel
Don’t need to know the data format in
which the data is stored
Can access data subsets.
Unidata Seminar Series - 30 January 2004
•
Data publishing (server)
–
–
–
Network interface via http
DAP provides common/network
representation for data
Can serve data in various formats
•
•
•
•
•
•
–
netCDF
HDF
SQL
FreeForm
JGOFS
DSP
Allows subsetting of data
OPeNDAP Status
•
OPeNDAP/DODS 3.4 release
•
OPeNDAP Java 1.1.3
•
OPeNADP Data Connector 2.3X
•
OPeNDAP DAP Specification 4.0
Unidata Seminar Series - 30 January 2004
OPeNDAP Data Object
•
Three important OPeNDAP data objects:
– DDX
• The DDX is an XML representation of the structure of all or part of a data
set, as well as a description of the variables within that datasets.
– Blob
• Binary data transfer from the data source to the client. The Blob contains the
serialized data represented by the DDX.
– ErrorX
• The ErrorX object is an XML document containing information about any
errors that may have been encountered by the server while processing a
request.
Unidata Seminar Series - 30 January 2004
DDX Example
•
DDX Example
<Datasets name=“fnoc1.nc”
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns=http://www.opendap.org/ns/OPeNDAP
xsi:schemaLocation=“http://www.opendap.org/ns/OPeNDAP
http://dods.coas.oregonstate.edu:8080/opendap/opendap.xsd”>
<Attribute name=“Description” type=“String”>
<value>Fleet Numerical Wind Data</value>
</Attribute>
<Array name=“u”>
<Attribute name=“long_name” type=“String”>
<value>U_Wind_Vector</value>
</Attribute>
<Float32/>
<dimension size=“16” name=“latitude”>
<dimension size=“17” name=“longitude”>
<dimension size=“21” name=“time”>
</Array>
<Blob URL=“http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u”/>
</Dataset>
Unidata Seminar Series - 30 January 2004
Variables and Attributes
•
Each variable consists of a name, a type, a value and a collection of Attributes.
–
Atomic variables: atomic data types are indivisible.
•
•
integer, floating-point, string, and binary images.
Example
<Float64 name=“Depth”/>
<Binary name=“sound_sample” size=“17623”/>
–
Constructor variables: a constructor variable is assembled from collections of other
variables, including both atomic and constructor types.
•
•
array, structure, grid, and sequence.
Example
<Array name=“temp”>
<Byte/>
<dimension size=“5” name=“lon”/>
<dimension size=“3” name=“lat”/>
</Array>
Unidata Seminar Series - 30 January 2004
Variables and Attributes
•
An attributes is composed of a name, a type, and a value.
–
–
–
Each variable may have zero or more attributes.
Types: Boolean, Byte, IntXX, UIntXX, FloatXX, String, URL.
Example
<Dataset name=“test”>
<Structure name=“measurement”>
<Attribute name=“data” type=“String”>
<value> 18 Mar 03</value>
</Attribute>
<Attribute name=“other” type=“Structure”>
<Attribute name=“satellite_name” type=“String”>
<value>GOES</value>
<Attribute name=“experiment number” type=“int32”>
<value>898976</value>
</Attribute>
</Attribute>
<Float64 name=“value”>
<Array name=“time_series”>
<dimension size=“32”>
</Array>
</Structure>
</Dataset>
Unidata Seminar Series - 30 January 2004
Requests/Responses
Responses: four categories of information pass from the server to client
–
–
–
–
Information about the data: DDX
The data: Blob
Error messages: ErrorX object
Information about the server: version messages and server capabilities document
Requests: a constraint expression provides a way for client to request certain information
from a dataset, such certain variables, or parts of certain variables.
–
–
–
Projection clause: a collection of one or more project elements
Selection clause: one or more select elements.
Example:
<Constraint>
<Project variable=“/sample/temp”/>
<Project variable=“/sample/salt”/>
<Select condition=“/sample/salt>34.0” target=“sample”/>
</Constraint>
Unidata Seminar Series - 30 January 2004
Problems of searching and retrieving
datasets from OPeNDAP server
•
Metadata
–
–
Use metadata: metadata at the data level
Search metadata: metadata at the directory level
•
OPeNDAP has been built from data level, high functionality at the data acquisition
level.
•
OPeNDAP AIS (ancillary information service) adding metadata information into
OPeNDAP data stream. The role of ancillary data is to translate and access of data
•
ODC is more a directory services with limit data searching functionality.
Unidata Seminar Series - 30 January 2004
Summary of OPeNDAP
•
OPeNDAP data delivery architecture provides remote access of data via internat.
•
OPeNDAP uses HTTP (FTP, GridFTP, Telnet, et cetera) to transport its data object.
•
OPeNDAP has proved very versatile.
•
XML for the persistent form of the data objects.
•
OPeNDAP is a data access tool, need a data discovery tool to complement each
other.
Unidata Seminar Series - 30 January 2004
THREDDS Project
• Develop a framework to bridge the gap
between data providers and data users, to
make scientific data discoverable and
usable as well as referencable from
scientific publications and educational
materials.
• The framework should be:
– Scalable for large and small projects
– Easy to use yet powerful and flexible
– Capable of supporting various user interfaces
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
THREDDS catalogs are for
communicating information
about a collection of datasets
• Hierarchal structure of
datasets
• Dataset access methods
• Structure on which to hang
(reference) metadata
1
0..*
0..*
0..*
Unidata Seminar Series - 30 January 2004
0..*
THREDDS Catalogs
THREDDS catalogs are for
communicating information
about a collection of datasets
• Hierarchal structure of
datasets
• Dataset access methods
• Structure on which to hang
(reference) metadata
1
0..*
0..*
0..*
Unidata Seminar Series - 30 January 2004
0..*
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
THREDDS catalogs are for
communicating information
about a collection of datasets
• Hierarchal structure of
datasets
• Dataset access methods
• Structure on which to hang
(reference) metadata
1
0..*
0..*
0..*
Unidata Seminar Series - 30 January 2004
0..*
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
THREDDS catalogs are for
communicating information
about a collection of datasets
• Hierarchal structure of
datasets
• Dataset access methods
• Structure on which to
hang (reference) metadata
1
0..*
0..*
0..*
Unidata Seminar Series - 30 January 2004
0..*
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
<dc:title>NCEP Eta 80km CONUS model data</dc:title>
<dc:creator>NOAA/NCEP</dc:creator>
<dc:subject>NCEP Eta Model data; Real-time data</dc:subject>
<dc:description>
This collection of real-time NOAA/NCEP Eta model data contains five
days worth of data. The data is on a 80km CONUS grid (GRIB grid
211). Daily 00Z and 12Z runs are available where each dataset includes
analysis data and forecast data from a single Eta run. Each dataset
contains forecasts for every 6 hours going out two and a half days
(60hrs) from the run time.
</dc:description>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
THREDDS catalogs are for
communicating information
about a collection of datasets
• Hierarchal structure of
datasets
• Dataset access methods
• Structure on which to hang
(reference) metadata
1
0..*
0..*
0..*
Unidata Seminar Series - 30 January 2004
0..*
THREDDS DQC
(Dataset Query Capabilities)
• THREDDS DQC documents describe how a subset of
a data collection can be requested.
– Large and time varying data collections are cumbersome to
view as a hierarchical structure
• THREDDS DQC documents describes the set of
requests that can be made to one or more DQC
services and the form of those requests.
• THREDDS DQC documents are an abstract
representation of a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS DQC
Subsetting Large Collections
Unidata Seminar Series - 30 January 2004
THREDDS DQC
<?xml version="1.0" encoding="UTF-8"?>
<queryCapability name="Unidata IDD NEXRAD Level 3 Radar Data" version="0.2">
<query base="http://motherlode.ucar.edu/cgi-bin/thredds/RadarServer.pl"
construct="append" returns="catalog"/>
<selectStation id="station" title="Stations:" multiple="true" required="true">
<station name="ANCHORAGE/Bethel AK" value="ABC">
<location latitude="60.78" longitude="-161.87"/>
</station>
…
</selectStation>
<selectList id="product" title="Products:" multiple="true" required="true">
<choice name=".5 reflectivity .54nm res" value="N0R"
description=".5 reflectivity .54nm res 16 levels id 19/r"/>
…
</selectList>
<selectList id="time" title="Times:" required="true">
<choice name="Latest" value="latest"/>
…
</selectList>
</queryCapability>
Unidata Seminar Series - 30 January 2004
THREDDS Services
• THREDDS catalogs are sources of
information about a collection of data on
top of which complex services can be built.
For instance, tools that:
– Provide interoperability with GIS systems
– Supply external discovery systems with
needed information (e.g., Dublin Core, DIF,
FGDC)
– Supply information to improve data display
and analysis, e.g., geolocation information
Unidata Seminar Series - 30 January 2004
THREDDS and Discovery Systems
• To supply external discovery services with
the information they require, we need:
– The proper information added to a catalog,
e.g., title and description of a dataset, spatial
and temporal ranges, parameters, dataset ID.
– Service to provide metadata in desired
encoding
– Service to feed information to discovery
system
• Use discovery systems to search for data
Unidata Seminar Series - 30 January 2004
THREDDS and Discovery Systems
THREDDS Services
with data server
Communicate with
Discovery Systems
Discovery
System
(e.g., DLESE)
Dublin Core
Generator
Metadata
Harvester
Reads
Catalog
References
Data server
Unidata Seminar Series - 30 January 2004
Searches
Writes
Metadata
Repository
Search and Discovery Services
Unidata Seminar Series - 30 January 2004
THREDDS Status
• Working on new versions of the catalog
and DQC schemas
• Working on updating existing tools to use
new schemas
• Working with UCAR DMWG and NCAR
CDP on enhancing descriptive metadata
• Working with OPeNDAP developers on
integrating THREDDS and OPeNDAP
Unidata Seminar Series - 30 January 2004
OPeNDAP and THREDDS
• Enhance OPeNDAP C++ implementation
to serve THREDDS catalogs
• THREDDS DQC replace OPeNDAP File
Servers
Unidata Seminar Series - 30 January 2004
OPeNDAP and THREDDS
More Information
• OPeNDAP Web page:
http://www.unidata.ucar.edu/packages/dods/
• OPeNDAP Email list: [email protected],
subscribe at http://www.unidata.ucar.edu/packages/dods/home/mailLists/
• THREDDS Email list: [email protected],
subscribe at http://www.unidata.ucar.edu/projects/THREDDS/maillists/
• THREDDS Web page:
http://www.unidata.ucar.edu/projects/THREDDS/
• Support questions: [email protected]
Unidata Seminar Series - 30 January 2004