The NERC DataGrid

Download Report

Transcript The NERC DataGrid

The NERC DataGrid
Bryan Lawrence
Director of the STFC Centre for Environmental Data Archival
(BADC, NEODC, IPCC-DDC etc)
Representing the NDG … a cast of thousands.
+
+
+
+
+[
BADC, BODC, CCLRC, PML and SOC
]=
Outline
• Introduction to the British Atmospheric
Data Centre and the NERC Earth
Observation Data Centre
• Introduction to the NERC DataGrid
• A tour of the underlying architecture:
from A to E, then S and a touch of Q!
• A quick walk thru the current NDG
“portal(s)”.
Seminar, Aston University, November, 2007
What is the BADC
• NERC’s designated data centre for
atmospheric science.
• "The role of the British Atmospheric Data Centre (BADC) is
to assist UK atmospheric researchers to locate, access and
interpret atmospheric data and to ensure the long-term
integrity of atmospheric data produced by Natural
Environment Research Council (NERC) projects.“
• Curation and Facilitation.
Part of NCAS ……………………………………….…………Hosted at STFC
Seminar, Aston University, November, 2007
Data Sets
“A collection of files
with a common theme
and administration”
• Ground based observation
networks Met Office surface
stations
• Model output NWP, ECMWF
reanalyses & Climate models
• Satellite data TOMS, Envisat
& MSG
• NERC programmes data
UTLS, CWVC & URGENT
Datasets statistics
• 147 datasets
• ~100TB
• ~80 Million files
Seminar, Aston University, November, 2007
Pollution
chemistry
Discomfort
indices.
Ocean productivity
User statistics
• Over 10,000 users registered
in BADC history
A & E influenza cases.
• 1650 users activelyCastle mortar decay.
downloaded data in FY06/07
• 2400 Users with 1 or more
licences
• 72% universities
• 67% from UK
Atmospheric
chemistry
models.
Wind
power
Bird feeding habits.
research
Seminar, Aston University, November, 2007
NEODC : Core activities
NEODC: several hundred
TB archive
• Online data archive
• Satellite imagery
– LANDSAT, SPOT..
• NERC Airborne data
– NERC ARSF
– NEXTMap Britain
• Dedicated UK archive
– (A)ATSR
Seminar, Aston University, November, 2007
Top datasets
• NEXTMap Digital
Elevation Data
• AATSR
Multimission
• ARSF (NERC
Airborne Research
and Survey
Facility)
• Envisat MERIS,
MIPAS,
SCIAMACHY
Seminar, Aston University, November, 2007
NERC Data Centres
NEODC
NERC Earth Observation Data Centre
BADC
AEDC
British Atmospheric Data Centre
EIC
Environmental Information Centre
NWA
National Water Archive
NGDC
National Geoscience Data Centre
BODC
British Oceanographic Data Centre
NEBC
NERC Environmental Bioinformatics Centre
Antarctic Environmental Data Centre
Seminar, Aston University, November, 2007
Complexity + Volume + Remote Access = Grid Challenge
British Atmospheric Data
Centre
Simulations
British Oceanographic
Data Centre
Assimilation
http://ndg.nerc.ac.uk
Seminar, Aston University, November, 2007
The NDG Use Cases
1.
2.
3.
4.
5.
Find data
Find out about data
Subset and difference data
Visualise data
All using SECURE interdisciplinary
technology.
Sound easy doesn’t it?
Want interdisciplinary semantic access to information, not
abstract data
–
–
–
getData(potential temperature from ERA-40 dataset in
North Atlantic from 1990 to 2000)
not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340,
190:200)
or even worse:
for j=1990:2000
getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340)
Seminar, Aston University, November, 2007
The Past: NERC Metadata Gateway
•No clean handover from discovery to
browse and use!
•And if I want to compare data from
different locations?
- re-discovery
- multiple logins
- multiple formats
- differing information structures
•Geospatial coordinates forgotten. Time
reference forgotten. Need to get entire
field(s), and find correct time!
Seminar, Aston University, November, 2007
The Original Vision
Internet Link
tape
robot
Online
Data
XML database
BADC NDG Wrapper
Online
Data
Online
Data
XML database
XML database
BODC NDG
Wrapper
Group NDG
Wrapper
Wider Internet
NERC Grid
Software Agent
Grid User
ESG (&other)
Applications
Satellite
Supercomputer
Research Group Data
Sources
Wider Internet
NDG
Web
Portal
Internet User
Internet Link
XML database
Seminar, Aston University, November, 2007
Standards
• ISO 19101:
Geographic
information –
Reference
model
…in a defined
logical
structure…
…delivered
through
services…
…and described by
metadata.
A geospatial dataset…
…consists of
features and
related objects…
Seminar, Aston University, November, 2007
NDG Metadata Taxonomy
NDG
Security
CSML
MOLES
!Pain!
(DIF)
NumSim
Vocab
Server!
Seminar, Aston University, November, 2007
What we’ve done (are doing)
Internet Link
tape
robot
Online
Data
Online
Data
Online
Data
XML dataXML dataSchema:
base
base
XML dataResearch Group Data
Pylons
MOLES
base
Sources
BODC NDG
Group NDG
Interface
CSML
Wrapper
Wrapper
BADC NDG Wrapper
& portals
DIF
Wider Internet
Wider Internet
+
Protocols:
NERC
Grid
Thinking
Software Agent
OGC +OAI
NDG
Web
about
Internet User
Portal
ESG (&other)
Grid User
ISO19139
Internet Link
Applications
etc
Pylons
XML data+
based
base
Python
Lots of
portal
WCS/WMS
Population
Vocab
Clients
Issues
Server
CSML
Toolbox
Satellite
Seminar, Aston University, November, 2007
Supercomputer
A
Seminar, Aston University, November, 2007
Standards
• Geographic ‘features’
– “abstraction of real world
phenomena” [ISO 19101]
– Type or instance
– Encapsulate important
semantics in universe of
discourse
• Application schema
– Defines semantic content
and logical structure of
datasets
– ISO standards provide
toolkit:
• spatial/temporal
referencing
• geometry (1-, 2-, 3-D)
• topology
• dictionaries (phenomena,
units, etc.)
– GML – canonical encoding
[from ISO 19109 “Geographic information –
Rules for Application Schema”]
Seminar, Aston University, November, 2007
CSML: Context
• NERC DataGrid: the integration problem
– multiple organisations, formats, storage
mechanisms (file, relational)
– only commonality is data semantics
Seminar, Aston University, November, 2007
CSML FeatureTypes
(Importance of
governance in
feature type
definition)
Seminar, Aston University, November, 2007
Grid Series Feature
Seminar, Aston University, November, 2007
Profile Series Feature
Seminar, Aston University, November, 2007
CSML Methodology
• Formal UML definition
1
• Semi-Automatic XML encoding based on GML.
• XML instances created by “Scanning”
– Scanning = config file + programme which runs over
data fileS.
• CSML library
– Includes CSML specific API
• CSML Support within
– OGC Web Map Service
– OGC Web Coverage Service
1
we needed some extensions, and standards modifications are under way)
Seminar, Aston University, November, 2007
Using CSML
Seminar, Aston University, November, 2007
CSML Storage Descriptor
Take home
message: CSML can
carry xlinks to real
binary payloads in
multiple formats!
Seminar, Aston University, November, 2007
CSML Storage Descriptor
Seminar, Aston University, November, 2007
CSML and xlink …
• xlink recommended in GML for linking to external content,
but no best practice established
• CSML wants to use this for linking to storage desriptor for
coverage domain/range
• ‘simple xlink’ properties:
– role: indicates a property of remote resource
– arcrole: describes meaning of remote resource
<someGMLElement
xlink:arcrole="hasRemoteContentEmbeddedAt#localXpath"
xlink:href="storageDescriptor#portion"
xlink:role="storageSchemaIdentifier"
xlink:show="embed"
xlink:actuate="onRequest | onLoad"/>
Seminar, Aston University, November, 2007
CSML instances – xlink example
<csml:gridOrdinate>
<csml:GridOrdinateDescription>
<csml:coordAxisLabel>Geodetic longitude</csml:coordAxisLabel>
<csml:coordAxisValues>
<csml:SpatialOrTemporalPositionList>
<csml:coordinateList srsName=“WGS84”>13.5 24.9 32.4 37.7 41.5 46.8 54.4 65.7
</csml:coordinateList>
</csml:SpatialOrTemporalPositionList>
</csml:coordAxisValues>
<csml:gridAxesSpanned>x</csml:gridAxesSpanned >
<csml:sequenceRule axisOrder="+1">Linear</csml:sequenceRule>
</csml:GridOrdinateDescription>
</csml:gridOrdinate>
<csml:coordAxisValues
xlink:arcrole=“http://ndg.nerc.ac.uk/xlinkUsage/insert#SpatialOrTemporalPositionList/coordinateList”
xlink:href=“file://myfile.nc#lon”
xlink:role=“http://ndg.nerc.ac.uk/fileFormat/netcdf”
xlink:show=“embed”>
<csml:SpatialOrTemporalPositionList>
<csml:coordinateList srsName=“WGS84”/>
</csml:SpatialOrTemporalPositionList>
</csml:coordAxisValues>
Seminar, Aston University, November, 2007
Simple CSML examples
>>> import csml
>>>>>> f1=d.getFeature('AduifGih')
>>> d=csml.parser.Dataset()
>>> dir(f1)
>>> d.parse('COAPEC_500YrRun_wholerun_annual_ocean.xml')
['ATTRIBUTES', 'CHILDREN', 'CONTENT', 'ELEMORDER', '__class__',
'__delattr__', '__dict__', '__doc__', '__getAxis',
'__getattribute__', '__hash__', '__init__', '__module__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', '__weakref__', '_csElement__removeURI',
'_getReverseSubsType', '_getSubstitutionType', '_subsetGrid',
'addChildElem', 'boundedBy', 'description', 'elem', 'featureType',
'fromXML', 'getAllowedSubsettings', 'getAxisLabels',
'getBoundingBox', 'getCSMLBoundingBox', 'getDomain',
'getDomainUnits', 'getLatitudeAxis', 'getLongitudeAxis',
'getNativeCRS', 'getSliceIndices', 'getTimeAxis', 'getUom', 'id',
'name', 'parameter', 'subsetToGridSeries', 'subsetToPointSeries',
'subsetToProfile', 'subsetToProfileSeries', 'testmethod', 'toXML',
'value', '{http://www.opengis.net/gml}id']
>>> d.getFeatureList()
['I4Uh81J7', 'AduifGih', 'PMHWAjfM', 'JcjheLH0', 'JjqCulWF',
'HiiJHFoS', 'KL8A8h0e', 'WCl3zlnO', 'Cbc6nuhO', 'Eu8QLcHt',
'E6oCHpra', 'RLnRcbK8', 'FaxqWqsN', 'Q9N6E5bb', 'DMfZxCrN',
'GuEegIRC', 'IpUSCCuD', 'V4s7VfJE', 'NMM772O2', 'ZpBiRSpz',
'RBHh2Vvc', 'FSSa53TD', 'RbevYfLe', 'V27HYMA3', 'Nbq3UUW9',
'O5yzxVwV', 'DHIXY5MA', 'VOY3MvMX', 'SKeDIQmh', 'V3G7g21c',
'BNiARzuZ', 'UFs5Do9B', 'DrtlWUC9', 'VR7fEIoU', 'ZTPY6iQ6',
'OZ1CosTm', 'YF03Sqo5', 'Ifrr2XBf', 'BZanwhZM', 'O82tMCbR',
'ArSBjQrC', 'HIHb61Ve', 'XWVK2rQM', 'WBMJeepx', 'GOxH2OwB',
'QS04dQNE', 'RNqv7mWG', 'ZuUJsuaB', 'QJYdLinq', 'AyRAr68f',
'ZAJykwZR', 'DvEbCTBU', 'Yb1HXe5o', 'R0aozMZG', 'VSMPmfn0',
'GPoCFTx3', 'Iy8l5ML3', 'YlfxWqOd', 'E9OmrOoW', 'CKPIcW1n',
'EoRftt49', 'Q1AvQTCZ', 'ZF7NGUVb', 'SuboVVD1', 'B24uCFy4',
'Tk9eSobX', 'JqyA7Z82', 'Gt4ny2eq', 'PR3MsqD0', 'LmWEhQmd',
'GA6T07j0']
>>> for i in d.getFeatureList():
...
ff=d.getFeature(i)
...
print ff.description.CONTENT
>>> f1.getTimeAxis()
'time'
>>> print f1.getCSMLBoundingBox()
<csml.API.csmlbbox.CSMLBoundingBox object at 0x83de7ec>
>>> bb=f1.getCSMLBoundingBox()
>>> dir(bb)
...
V COMPONENT OF ICE VELOCITY (M.S-1)
['__class__', '__delattr__', '__dict__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', '__weakref__', 'envelope', 'getBox',
'getCRSName', 'getTimeLimits', 'maxX', 'maxY', 'minX', 'minY']
SNOWFALL INTO OCN/ONTO ICE KG/M2/S A
>>> print bb.getBox()
THICKNESS DIFF COEFF (OCEAN) CM2/S
[0.625, -88.75, 359.375, 88.75]
…
>>> print bb.getTimeLimits()
HICE INC. DUE TO DYNAMICS M/TS
BAROCLINIC V_VELOCITY (OCEAN) CM/S
['2790-06-01', '3289-06-01']
>>> print f1.getAxisLabels()
['longitude', 'latitude', 'height', 'time']
>>>
Seminar, Aston University, November, 2007
B
Seminar, Aston University, November, 2007
MOLES Concepts
Core linking concept is the deployment
of a Data Production Tool at an Observation Station
on behalf of an Activity that produces a Data Entity
Activity
Links the metadata records
into a structure that can be
turned into a navigable
XML using Xquery with
any of the record types as
the root element.
Data
Production
Model
Tool
Deployment
Data Entity
Observation
Station
Computer
Each of the main
metadata objects has
security data attached to
it. This means that this
can be applied to queries
on the metadata
Seminar, Aston University, November, 2007
MOLES, CSML and O&M
• OGC ‘Observations and Measurements’
cd O&M
«Union»
procedure::Procedure
+
+
procedureUse: ProcedureEvent
standardProcedure: ProcedureSystem
CSML
+procedure
CV_DiscreteGridPointCoverage
Cov erage Types::
ProfileSeriesCov erage
1
+generatedObservation 0..*
Event
+result
«FeatureType»
observ ation::Observ ation
+/ domainSet: ProfileSeriesDomain
+/ rangeSet: Record [0..*]
+
+
+
+value
quality: DQ_DataQuality [0..1]
responsible: CI_ResponsibleParty [0..1]
result: Any
+featureOfInterest
«FeatureType»
Feature Types::
ProfileSeriesFeature
+
location: DirectPosition [0..1]
+observedProperty
+parameter
1
{Definition must be of a phenomenon that is a property of the featureOfInterest}
AnyDefinition
«ObjectType»
phenomenon::
Phenomenon
An Observation is an Event whose result is an estimate of
the value of some Property of the Feature-of-interest,
obtained using a specified Procedure
Seminar, Aston University, November, 2007
Data Model Relationships
Convergence
within O&M
Route to
harmonisingmulti
ple data models
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
E
NumSim – Numerical Simulation descriptions
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
S
Seminar, Aston University, November, 2007
NDG Security
Data Resources are scarce!
– It is easy to overload data servers!
– Really, really easy!!
Data has IPR
– Folk don’t always want to share with
everyone
– Some folk want to limit access for monetary
gain 
Need to control access to different users
with differing roles behind the same
server.
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Authorisation
• Role-based access:
Signed “conditions of use”
form exists for this dataset
<dataset>
<host> badc.nerc.ac.uk </host>
<name>ukmo-obs </name>
<access-requires> researcher <access-requires>
<access-requires> ukmo-obs </access-requires>
<processing-requires> nerc </processing-requires>
</dataset>
• Key concept: Only hosts that trust each other share
data, even within a larger virtual organisation: e.g. at
BADC:
<trusted>
<bodc>
<host>ndg.bodc.nerc.ac.uk</host>
<attribute remotename=”nerc”> nerc </attribute>
<attribute remotename=”ashoe”> ashoe </attribute>
<attribute remotename=”staff”> nerc </attribute>
<other> bodc </other>
</bodc>
</trusted>
(not actual syntax)
Seminar, Aston University, November, 2007
NDG Security
Certificate based, pass encrypted
credentials between user and
gatekeeper.
Seminar, Aston University, November, 2007
NDG Security Components
Seminar, Aston University, November, 2007
D
Seminar, Aston University, November, 2007
A tiki tour
coapec
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
DIF Example - 1
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
(woops continuity)
Seminar, Aston University, November, 2007
Why DIF?
• NASA Global Change
Master Directory,
Directory Interchange
Format (DIF)
• (Was) relatively stable
and well described.
• Major problems with
encoding (ISO19139)
of ISO19115 content
standard
– Only recently vaguely
stable
– Still big problems with
profiling!
– OCL??
Seminar, Aston University, November, 2007
NDG Discovery
Seminar, Aston University, November, 2007
NDG Discovery Client
All NDG server and client software built using
pylons
Seminar, Aston University, November, 2007
MDIP: Marine Data Information Partnership
All portals built by EDINA based on the NDG
Discovery SOAP Service (AND THE NDG
HARVEST INFRASTRUCTURE)
Seminar, Aston University, November, 2007
Vocab Server
Controlling all vocabularies, SOAP
and RESTful interfaces.
Providing the underpinnings of
the NDG ontologies.
Courtesy of Roy Lowry, BODC.
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
“GRID”
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
coapec
Seminar, Aston University, November, 2007
Seminar, Aston University, November, 2007
What I haven’t talked about …
• Web services:
– WMS, WCS, WPS, WFS
– (The panoply of OGC services)
• Service binding:
– (in any detail)
• And so to the walk through …
Seminar, Aston University, November, 2007