DAP Clients and Services

Download Report

Transcript DAP Clients and Services

DAP Clients and Services
Section 3
APAC ‘07 OPeNDAP Workshop
12 Oct 2007
James Gallagher
Outline
• Browsing a Server - jump right in
• DAP Requests and Responses background on using DAP
• Finding Data
• Types of Clients
– Graphical
– Command line
– Custom
Browsing a Server
• Type the Server’s URL into the browser
• Hyrax (and most other DAP servers)
provide a way to browse data
• Choose a data set using THREDDS
catalogs and/or common directory
traversal
• Choose one or more variables within a
data set using the HTML form interface
Open a server…
Type the server’s URL; the URL could be an
Entry in a catalog or HTML page.
Contents at the top-level
These links become active when a dataset
Is listed. For a directory, these don’t apply
Browse its directory structure
Follow the Pathfinder links down to …
…and traverse all the way
down to a file
… this point. Now we see a listing of datasets
Descend into a dataset
Open a file
Note that the URL is duplicated here.
Supply a constraint; Get ASCII
data
Use the form elements to build a Constraint
Note that the constraint is visible here, appended
to the URL
The ASCII data view
Note the constraint and the ‘.asc’ suffix appended
before the constraint.
Spreadsheets can often read URLs and they
Can parse the CSV output of Hyrax (and most
Other DAP servers)
Paste a DAP URL with the ‘.ascii’ extension into
the Location box
Data read into the spreadsheet. Sometimes you
have to tell the spreadsheet how to ‘import’ the
data
Browsing summary
• Directory hierarchy browsing
• Data files open to a HTML form which
enables choosing variables
• The form supports interactive construction of
constraint expressions and ASCII data returns
• The form interface has many limitations but it
can be used in many different situations
DAP background information
• Data are referenced by a URL
• DAP responses with metadata or data
are requested using tokens appended
to the URL
• With a data granule, elements are
accessed using a Constraint Expression
URLs Reference Data
• As we’ve seen, URLs reference data
granules (usually files).
• DAP, version 2 defines three responses
– DDS - syntactic metadata - information about the
structure of the data
– DAS - semantic metadata - background
information about the data
– DODS - data - actual data values, bundled with
syntactic metadata to form a self-contained
response.
DAP Data Model
• A Dataset is a collection of variables
(tuples of type-name-value)
• Each variable has attributes which are
also type-name-value tuples
• The Dataset may also have ‘global’
attributes
Data Model Types
• Types of variables:
– Scalars: Byte, Integer, Float, String, URL
– Array: N-dimensional
– Structure: Simple aggregate type
– Sequence: hierarchical table data
– Grid: Array with map vectors (establishes a
mapping between array indeces and
independent variable values)
Attributes
•
•
•
•
Scalars
Vectors
Structures
No Grids or Sequences.
Accessing those responses
• For each of the responses, add the extension
.dds, .das or .dods at the end of the URL
‘file name.’
…or use the form interface
Other response types
• DAP4 will use XML to encode metadata and
replace the two objects with a single
response accessed using .ddx
• Virtually all servers support:
– Info (.info): A HTML page built using all the
metadata
– HTML (.html): The HTML for interface we’ve
seen
– ASCII (.asc, .ascii): The ASCII data dump,
also already seen
Aggregation
• There are several different servers which can
perform aggregation
– TDS: Array data
– GDS, Hyrax/JGOFS: Sequences (table data)
– BES (but not when used in Hyrax): Any collection
of data types aggregated to a Structure
• Aggregation maps searching and selecting
from an Inventory onto using a constraint
expression
• Aggregation can eliminate the dichotomy
between inventory searching/access and data
access
An example Aggregation
•
http://satdat1.gso.uri.edu/thredds/dodsC/NWAtlanticDec_1km.html
THREDDS responses
• Use THREDDS to define a logical hierarchy
that’s distinct from the set of directories that
actually hold the data.
• We can request THREDDS catalog XML files
using ‘catalog.xml’ or HTML pages using
‘catalog.html’ after a directory name.
• While the directory browser works for any
directory, THREDDS catalogs are valid only
for the logical hierarchy they define
• Files/Directories not included in that hierarchy
have no catalogs
THREDDS examples
• Switch Hyrax to the THREDDS HTML view:
Choose the HTML view
The THREDDS HTML view
• The top-level THREDDS catalog on our test server
defines a single data root directory (SVN Test Data
Archive)
• This illustrates how THREDDS can be used to control
the view of data presented by the server
• Use ‘catalog.xml’ in place of ‘catalog.html’ to get the
catalog data in an XML document.
Traverse the links to find data
THREDDS data set page
• THREDDS catalogs can list more than one access
mechanism - here we see on the DAP, but WCS,
WMS, et c., are other possibilities
Choosing DAP access leads
to the HTML form
DAP Summary
• DAP requests are made using a token
appended to the filename part of URL
• Responses defined by the DAP2 and (in
progress) DAP4 are: DDS, DAS, DODS and
DDX. These return metadata and data
• Other responses are used to access ASCII
data values, HTML metadata pages and data
access interfaces
• Constraint expressions are used to limit
(subset, projection, selection) data returned
DAP Summary, cont.
• THREDDS is
– a distinct protocol
– compliments DAP
– as Hyrax implements it supports both
HTML and XML views of the catalogs
– Defines a logical hierarchy that is distinct
from the way the data are actually stored
Finding Data
• Ways to find data:
–
–
–
–
–
The OPeNDAP Data Set List
GCMD
TPAC
Google
THREDDS
• We maintain a page with links to dataset searching
sites:
– http://www.opendap.org/data/index.html
Common Features
• All of these data location features except Google depend on
active community involvement in building catalogs of data
• The solutions can be described as static documents or crawlers
• Google and TPAC are crawlers
– Crawlers can discover datasets without human intervention
– They can make mistakes that seem silly
• The The Dataset List, GCMD and THREDDS are static
documents or collections of static documents
– Static lists can be tailored by hand
– They can go out of date quickly
Differentiating Features
• Google & TPAC:
– Google is just crawling HTML. If a server is
not linked to a HTML page, it won’t be
found.
– TPAC is preset with server locations and
picks up changes at those sites
Differentiating Features, cont.
• The Static Lists:
– The Dataset List has a very low metadata requirement
– Not maintained as actively as either GCMD or THREDDS
catalogs
• GCMD:
– The GCMD has a fairly high entry level threshold
– Professional staff maintain the GCMD as their sole job
• THREDDS
– THREDDS catalogs are, or can be, located at the data locality distributes maintenance
– Quality varies from site to site
Finding Data Summary
• Locating data seems like it would be the
place to start building a system, but it’s
far more varied than the one-size-fits-all
approach most tried in the 1990’s
• Crawlers and hierarchical lists show the
most promise but maintained
centralized lists are also useful
Accessing Data with DAP
• Web Browser
– Already discussed…
• Graphical clients
– ncBrowse, ODC, Ferret, GrADS
• Command-line clients
– getdap (UNIX, win32), loaddap (Matlab, IDL), nco
(UNIX, win32)
• Custom clients
– C++, C, Java, Python
– netCDF
Using a Graphical Client
• Example: The OPeNDAP Data
Connector
• Combines data location with retrieval
and display
• Shows the built URL, including
constraint expression
– Can be transferred to another application
Start the ODC
The ODC opens to the search
pane
Five different panes
Choices within a pane
Use the dataset list to find the
TPAC climatologies
Choose the Antarctic Cooperative
Research Centre TPAC/CISRO
Climatologies
…then hit ‘To Retrieve’ to move
the selection to the next pane
The Retrieve pane
Double click ‘levitus_annual_97.nc’
To see the contents of the file in
The area on the right
The ODC shows the URL as it builds it.
Click the checkbox for SALT and O2.
For both, set the range of z_index to
‘0 to 0’. Make sure to hit tab/return in
The boxes.
…then hit ‘Output to’ to move to the View pane
There are a number of ways to view
The data. Here the plotter has been
Chosen (the default).
Hit ‘Plot to’ to generate a plot using the
Default settings.
When the plot is made, the interface
Switches to the ‘Preview’ tab
Switch back to the ‘Variables’ tab to
Plot O2
Choose ‘O2’ from the menu, then hit
‘Plot to.’
Now that the data have been read and
Cached, you can switch back and
Forth between variables quickly without
Any additional data transfers
When ready, go back to the ‘Retrieve’
Pane.
Choose ‘TEMP’
Set the constraint
…then plot
ODC Summary
• The ODC provides a way to search for,
access and plot data
• Acts as a ‘URL builder;’ the URLs can be
pasted into other applications
• We didn’t need to know anything about DAP,
its Request or Response objects or how a
URL is used to request data
• The data set list often contains stale entries
• Also supports using the GCMD for data
location - more on this when we cover
searching
Using a Command-line Client
• Matlab - demonstration
• NCO - a powerful tool developed and
maintained by another group
Matlab
• Demonstration of custom-built Graphical
interfaces for Matlab
• Matlab scripting is used to build the interfaces
and provide some dataset-specific processing
• A Matlab command extension is used to read
the data (written in C/C++).
• Two things are required in addition to Matlab:
The DAP command extension (‘loaddap’) and
the graphical interface software.
Running the Matlab
Demonstration
•
•
•
•
Start Matlab
Download the command extension
Download the interface software
In Matlab change directory to the ‘mlocean-testbed’ directory.
• Type ‘OCEAN_TOOLBOX’
• The interface will start…
The Ocean Toolbox
Open a dataset
I choose the Pathfinder
dataset
Fill in the information
SST & Quality fields
Load data into the Matlab
workspace
Get the data
Load data into the Matlab
workspace
Plot/Display the data
Using the loaddap command
extension directly
• Start Matlab
• Add the directory with the extension to
the Matlab command path
• Verify the command extension is
working
• Feed it a URL
• Plot the data
Pass a URL, constrain the response
To the ‘u’ and ‘v’ vectors only
Plot those vectors; See Figure 1
Matlab Summary
• Command line client is the tool used to move
the data
• Easily used in Matlab scripts to hide the
details and make custom interfaces
• To the the command extension directory, user
must know:
– Data location (URL)
– Internal structure of the data set (syntactic
metadata - DDS/DDX)
– How to write a constraint expression
NetCDF Operators (NCO)
• Unix command line client
• Unlike the previous two clients, NCO uses the
netCDF client library to read from a DAP
server
– A client library is a collection of functions which
hide the mechanics of (most of) the interaction
with a server so the client can go about its
business
– The NCO client is, in fact, just the NCO package
linked to our (OPeNDAP’s) version of the netCDF
library (aka. the netCDF client library)
Build the NCO Software
• Change directory to /root/src/nco-3.9.2
– root@slax:~# cd /root/src/nco-3.9.2
• Run configure to build the Makefile,
then build and install the software
– root@slax:~# ./configure
– root@slax:~# make
– root@slax:~# make install
Use NCO to Convert the
FNOC1 vectors into a speed
• NCAP: NCO Arithmetic Processor
– ncap -O -s “windspeed=sqrt(u^2,v^2)”
http://localhost:8080/opendap/data/nc/fnoc1.nc
wndspd.nc
• The URL is the input ‘file’ and
wndspd.nc is the output
• Use ncdump to look at the result file
– ncdump -h wndspd.nc
– ncdump -v windspeed wndspd.nc
View the Result: ncBrowse
• We can use ncBrowse to look at the
local neetCDFZ we just built
• ncBrowse can also look at the DAP
server directly
• Built using the DAP-enabled Java
netCDF library (a client library where
access to DAP servers hides behind the
netCDF API)
Start ncBrowse
Double click on ‘speed’ - the new data
we made with the previous NCO example
Fix up the latitude and longitude axes, the
‘Graph Variable.’
We have to be somewhat savvy about the
units - check back and look at the attributes…
Custom clients
• What options exist to build clients
– C++ using libdap
– C using Ocapi
– C,Fortran using the netcdf client library
– Python using PyDAP
– Java using Java-OPeNDAP
– Matlab & IDL using the respective versions
of loaddap
Clients Summary
• Custom clients offer an opportunity to develop
for a specific audience or a particular
problem/project.
– Example: ComMIT Tsunami inundation model
client developed by NOAA/PMEL and BOM
• General purpose clients like loaddap can
read any kind of data while clients built using
the netCDF client library are limited to the
semantics of netCDF
– Example: Record access is slow because each
access is separate network request
Clients Summary, cont.
• ODC: A client built specifically to provide a ‘browse’
capability for any data source
– Uses Java-OPeNDAP
• Loaddap: a client built to read any data into an
analysis application
– Can be used as a building block for more sophisticated
applications
– Use libdap (C++, Matlab) or Ocapi (C, IDL)
• netCDF client library: A client-building tool
– convert ‘legacy’ code
– provide a simple way to write new applications
– C++, C, Fortran