BarkerGallagher_final

Download Report

Transcript BarkerGallagher_final

NOAA’s National Ocean Service • Office of Response and Restoration
Serving unstructured grids using
OPeNDAP: Using server-side operations
to subset and subsample data
Christopher Barker
NOAA Office of Response & Restoration
Emergency Response Division
James Gallagher
OPenDAP, inc.
NOAA Emergency Response
Division
• National Contingency
Plan specifies NOAA’s
role in supporting the
Coast Guard:
“Provide scientific expertise to support an
incident response for Oil and Chemical Spills”
Key Role: Trajectory Modeling
• Where is the oil (or
chemical) going?
Primary Tool: GNOME
(General NOAA Operational Modeling Environment)
• Lagrangian element
(particle) model
• Forcing from external
sources:
– Winds
– Currents
• Currents:
– In house model
– External operational models
GOODS
GNOME Online Operational Data Server
Example: Deepwater Horizon
• Ocean models utilized:
– NOAA CSDL: NGOM
– Navy models: NCOM, HYCOM, IASNFS
– USF: West Florida Shelf ROMS
– TGLO/TAMU: TX shelf ROMS
– NC State: SABGOM
– All structured grid models
Unstructured Grid Models?
• Unstructured Grids:
– Allow resolution to vary spatially
– Conform to boundaries
• Nice for oil spills and particle tracking
• Many more UGRID models coming online
– Many papers at this conference
Some Models of Interest
• FVCOM:
– nGOMOFS (NOAA CSDL)
– Gulf of Maine/Mass Bay (UMASS)
– Salish Sea (PNNL)
• SELFE:
– Columbia River (OHSU)
– Texas Estuaries models (UT)
• ADCIRC:
– Gulf of Mexico / Southern LA and Texas grid
9,108,128 nodes--18,061,765 elements
nGOMOFS
(NOAA CSDL)
V6
90,310 Nodes
174,550 Elements
What if I just need Mobile Bay?
Mobile Bay, AL detail grid.
About 300 m grid resolution
along a 13 m deep navigation
channel
FVCOM-GoM/GB for Mass
Bay and Nantucket
Sounds/Shoals
Boston Inner Harbor
ADCIRC:
Gulf of Mexico / Southern LA and
Texas grid (SL18TX)
• Gulf of Mexico / Southern LA and Texas grid
9,108,128 nodes--18,061,765 elements
• Just surface currents:
– 275 MB per time step
(plus the grid specs)
Obstacles to using UGRID models:
• No standard for data/results on UGRIDS:
– Informal working group for (quite!) a few years
– Recent draft standard (netcdf 3)
– Work on JavaNetcdf lib to support it
(SURA modeling test bed project)
• Big Grids:
– Need server side subsetting
How to get it done?
• NOAA/ORR post-DWH funding:
– Better able to response to large spills
• We started talking to folks about server-side
subsetting options
• But we’re clients:
– We’re not going to run a server
• We needed something that would
become an excepted standard/tool.
How to get it done?
• NOAA/NESDIS noted assorted issues:
– Netcdf/OpenDAP development funding limited
– Multiple diverging implementations:
“Unfunded Mandate”
• NESDIS coordinated funding from:
– Technology, Planning and Integration for
Observations (TPIO) Program
– OR&R
– National Climatic Data Center (NCDC)
OPeNDAP-Unidata Linked Servers
(OPULS)
• NOAA/BAA grant supports this important collaboration
between Unidata & OPeNDAP
• First goal: conformance between OPeNDAP & Unidata servers,
through which access is gained to growing amounts of NOAA &
related data. Other short-term goals include:
– Asynchronous modes, such as are needed for (delayed) access to nearline data, perhaps stored on tape, e.g.
– Improved access (with server-side subsetting) to data organized on nonrectangular meshes, such as in coastal modeling
• Work began in Boulder during October & will be influenced by
an advisory committee (yet to be appointed)
OPeNDAP:
the Data Access Protocol
• DAP2 combines simple data model with a
general set of operators.
– Data Model: Atomic types (e.g., ‘Integer’); Arrays;
Structures; Grids; and Sequences.
– Operators: These provide ways to subset all but
the atomic types.
– Domain neutral: By keeping the semantics of the
model clean, we ensure that it can be applied to
many different types of data.
But how is it used?
• DAP is generally used as a ‘web service’
• DAP requests are made using a URL
• DAP responses are ‘documents’:
– Text that contains metadata
– Combination of text/metadata and binary data.
• Applications read these responses and use
them it whatever ways they see fit:
– the netCDF client library makes legacy
applications believe they are reading from a local
file
About Array and Grid Selection
• In addition to requesting a Grid or Array, the
Selection can be used to subset in indicial
space.
About Functions
• Constraint Expression can contain functions
• These functions can perform any operation
that can be programmed.
• Thus they provide a good way to extend a data
server to perform new operations
• These include operations that are not domain
neutral
• In Hyrax they are written in C++
Example URLs
• The base URL: “http://test.opendap.org/opendap/data/nc/fnoc1.nc”
• To get metadata:
– Dataset variables: http://test.opendap.org/opendap/data/nc/fnoc1.nc.dds
– … attributes: http://test.opendap.org/opendap/data/nc/fnoc1.nc.das
– Or less readable in XML: http://test.opendap.org/opendap/data/nc/fnoc1.nc.ddx
• To get data:
– Just the variables u and v:
http://test.opendap.org/opendap/data/nc/fnoc1.nc.dods?u,v
– … in ASCII so it’s easy to read: http://…/opendap/data/nc/fnoc1.nc.asc?u,v
• With subsetting:
– http://test.opendap.org/opendap/data/nc/fnoc1.nc.asc?u[0][3:6][5:8]
• Here’s a function:
– http://…/nc/coads_climatology.nc.ascii?geogrid(SST,45,-80,20,60,”1000<TIME<3000”)
– This is an example of how functions can enable domain-specific behavior; this
function will return an error if the Grid is not ‘geospatial’
Challenges
• Unstructured Grids are not a specific type in DAP
• We must choose a way, or set of ways, to represent
these data
• Datasets are often too large to download –
subsetting must be done server-side.
• Because the subsetting operations are complex, we
will need to use server-side functions to implement
them
Requirements
• Must enable subsetting
by polygonal regions
• The result must be an
unstructured grid itself
• A subset must preserve
the topological and geometric
relationships present in the whole:
– we can’t just regrid everything to a more
convenient form.
Proposed Solution
• Server-side function to add subsetting
• Adopt the proposed unstructured grid
encoding using netCDF3
• Result of the function will be a DAP2 response
– Input is netCDF3 with some additional
‘conventions’: it can be represented in DAP2
– There are existing clients that can read DAP2
• If they understand netcdf in the new convention, they
will understand the results
The server-side function
• Ugrid(Mesh,<polygon>)
– <polygon> is a comma separated list of latitude
and longitude points
– However, there is an arbitrary limit to the number
of characters in a URL, so
• We will also support POST when OPULS makes
the transition to DAP4
– It will likely take more than a year for all of DAP4
to be realized, but POST for constraint expressions
will be set in the first year.
Example ugrid() calls
• http://…/model.nc?ugrid(SST,45,-80,20,-60)
– When ugrid() is called with two points, it
will assume the polygon is a box.
• http://…/model.nc?ugrid(SST,45,-80, 45,-60, 20,-60, 20,-80)
– Here the polygon the same box as above.
– There’s an understood edge connecting the first and last points
– Point order is important – self-intersecting polygons will raise an
error.
http://…/model.nc?ugrid(SST, -71.03, 42.38, -71.06, 42.37, 71.06, 42.36, -71.06, 42.35, -71.04, 42.33 -71.01, 42.34, 71.01, 42.35, -71.03, 42.38)
Implementation
• We will use the Gridfields library [Howe 05]
• The library will be extended to work with the
new netCDF3 file format:
“Deltares CF proposal for Unstructured Grid
data model”
• And to work with DAP
[Howe 05] Bill Howe, David Maier, “Algebraic Manipulation of
Scientific Datasets,” VLDB Journal, 14(4) 2005
Progress so far
• Gridfields has already been used to build a
simpler server-side demonstration function
• The Gridfields code has adopted GNU’s
autotools to streamline its build.
• We will factor out the C++ code into its own
project, separate from the Python layer
• This will simplify moving gridfields into the
Linux community builds
Summary
• Ugrid models are seeing wide deployment
• Subsetting UGrids on the server is critical to
the wide use of model results
• UGrids will be encoded in netCDF3
• We will use a widely available open-source
library to perform the actual operations
• The results will be valid UGrids, in DAP
• The work has begun
Use for Curvilinear grids, too?
• Capture arbitrary polygon subset.
• Rectangle in geo-coordinates not a rectangle
in grid coordinates
– We generally over sample.
- But that’s not always a good solution for highly
deformed grids.
- What would the result look like?
- A new structured grid?
- An unstructured grid?
Further Discussion, etc.
• Meet here at ECM:
– Lunch Wed?
• Discussion on UGRID Google group:
https://groups.google.com/group/ugrid-interoperability
• OPeNDAP Wiki:
http://docs.opendap.org/index.php/Projects