Virtual Observatories and Data Interfaces for Atmospheric

Download Report

Transcript Virtual Observatories and Data Interfaces for Atmospheric

Virtual Observatories and Data
Interfaces for Atmospheric
Science
12th EISCAT International Workshop
Incoherent Scatter Radar School
Swedish Institute of Space Physics, Space Campus, Kiruna,
Sweden
Bill Rideout
MIT Haystack
Route 40, Westford, MA, USA
1-781-981-5624
[email protected]
26 August 2005
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
A day in the life of an
Atmospheric Scientist
I have done an experiment with my instrument,
but now I need to …
– Search numerous websites for data
– Figure out their parameters, units
– Figure out their coordinate system, date format
– Figure out how to determine data quality
– Write code to download data, or (worse)
manually download
– Write code to convert to your format
– Finally, do science
How can Virtual observatories help?
Virtual Observatories – one stop data
shopping!
Virtual Observatories
 Ideally…
– Provide a single interface to access all data
– Knows about all data sources
– Allows simple, powerful searches to discover
unknown data sources
– Always gets the most up-to-date data
– Uses a single set of well-defined parameters
– Provides data in consistent format(s)
– Provides data in consistent coordinates
– Informs user of contact information and rules-ofthe-road for all data
Two approaches
 Top down
Build an interface
 Bottom up
Build a standard
data source
How do they work?
 Top-down approach:
– Accept that all data sources will be forever
incompatible
– Build a data model so metadata can be shared
– Build a unique interface to interface to each new data
source.
– Scales linearly with number of data sources.
– Works best with more uniform data (i.e., astronomical
images)
 Bottom-up approach:
–
–
–
–
Standardize data format and semantics
Standardize data provider API
Approach taken by Madrigal/Cedar
Try for community acceptance
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
What is the Madrigal
database?
 An open-source, web-based database
designed to hold one group’s data
– www.openmadrigal.org has all code and
downloads
 Built upon the Cedar database format
established over 20 years ago
 Fundamentally a data source – allows
local owners to improve/correct their data
 Designed to be used for a wide variety of
instruments
 New installations always welcome!
Madrigal Data Model
Madrigal site
(typically a facility with scientists and a Madrigal installation)
↓
Instruments
Data shared
(ground-based, typically with a set location)
among all
↓
Madrigal sites
Experiments
(typically of limited duration, with a single contact)
↓
Experiment Files
(represents data from one analysis of the experiment)
Data unique
↓
to one
Records
Madrigal site
(measurement over one period of time)
↓
Madrigal Records
Records
(measurement over one period of time)
Three types:
 Catalog record
– descriptive information about entire experiment
 Header record
– descriptive information about one section of experiment
 Data record
– Stores values
– All parameters defined by Cedar Database standard
– Contains 3 parts
 Prolog
 1D records
 2D records
Madrigal Data Records
 Prolog
– Start and end time
– Instrument id
– Kind of data id
 1D records (scalar)
– Single value parameters
Prolog
Data
record
ID (scalar) – S/N=2.5
2D (vector) –
Altitudes = 100,150,
200,250,300,350
 2D records (vector)
–
–
–
–
Multiple value parameters
All parameters must have same number of rows
Meant to allow multiple spatial measurements
Not meant for time variation – conflicts with Prolog!
Cedar/Madrigal Database
 All parameters in file defined
– http://cedarweb.hao.ucar.edu/documents/parameters_list
.txt
 Ranges of parameters for each instrument
 Data stored in one or two 16 bit ints
– Additional increment parameters
 Error parameters
– Mnemonics start with D
– Code is negative of parameter
Cedar/Madrigal Database,
continued
 Special values
– missing
– assumed (error value only)
– knownbad (error value only)
 Defined in
– http://cedarweb.hao.ucar.edu/cgibin/cedar_file_access.pl?filename=documents/c
edar_fmt.pdf
Cedar Database parameters
Example
additional
increment
parameter
Cedar parameters - continued
 Madrigal contains many “derived only”
parameters
– Not included in Cedar standard
– Cannot be stored in Cedar file
 New python API hides the existence of
additional increment parameters
– All values are doubles
– Exceptions occur on overflow
– More later…
Madrigal Derivation Engine
 Derived parameters appear to be in file
 Assumes information can be derived from
records
– Time from prolog
– Position either as 1D or 2D
– Other parameters
 Engine determines all parameters that can
be derived
Classes of derived parameters
 Space, time
– Examples: Local time, shadow height
 Geophysical
– Examples: Kp, Dst, Imf, F10.7
 Magnetic
– Examples: Bmag, Mag conjugate lat and long,
Tsyganenko magnetic equatorial plane intercept
 MSIS
– Examples: Tn, Nol
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
Madrigal web interface - homepage
Access Data
All Madrigal sites
Three ways to access Madrigal data
Data in individual
experiments
Data across
experiments
Plot data across
experiments
Searching for experiments
Choose one or
more instruments
Find any experiments
with any overlap with
these dates
By default, view
only most recent
files
By default, all
Madrigal sites
are searched
Madrigal experiment listing
These links could
be to experiments
at any site
Madrigal experiment files – part
1
These two files have
no catalog or header
records, otherwise
there would be a link
Data browser
(isprint) allows viewing
both measured and
derived parameters
with filtering
Madrigal experiment files – part
2
Madrigal allows
any additional
web-compatible files
to be added to the
experiment
Image-conversion
feature written at
Eiscat
Notes can be added
by users – also
written at Eiscat
Data browser (isprint) – part 1
Filters to
reduce data
•Time
•Altitude
•Azimuth
•Elevation
…
Users can
define filters
to select
certain filters
and parameters
with one click
Data browser (isprint) – part 2
Filters, continued
•Filter data using any
parameter, or the sum,
difference, product or
quotient of two
parameters.
Example:
Nel –DNel > 1
Data browser (isprint) – part 3
See a full description
of all parameters
Choose parameters
to display
Measured in bold
Derived in normal font
Listed by category
Click on any parameter
for a full description
Data browser (isprint) – part 4
User clicked on
CHISQ
Some parameters
have a more
complete description
Data browser (isprint) – part 5
Longer description
of CHISQ
Data browser (isprint) – part 6
Show header for
each record option
String to
indicate
missing
data
Data output
Display only text
Save text version to file
Summary of
selected filters
Headers were on
in this example
Second approach – Global
search
Global search for
data
Global search – part 1
Choose one or
more instruments
Choose date range
(optional)
Choose kinds
of data
Choose seasonal
filter (optional)
Global search – part 2
Filter by experiment
name (optional)
Select parameters
to display
Filter using any
parameter, just
like isprint
Global search – selecting
parameters
Parameters with
categories and
pop-up definitions
as on isprint
page
Global search – review search
Review all aspects
of the global search
before submitting
Global search – returned
message
Message returns
number of files
being searched,
along with rough
estimate of time
required.
Since reports may take a long time
to generate, a email with a link is
sent when done
Third approach – Plotting across
experiments
Plotting data from
various instruments
across experiments
Creating plots
Select one or more
instruments. In this
example Svalbard
and Millstone (not
visible) selected.
Select a scatter
plot or pcolor of
altitude versus
time
Click here
to see a list
of all
experiments
Select date
range (can
cross
experiment
boundaries)
Choose single parameter to plot
Same pop-up
listing of parameters
as in isprint
Radio buttons, since
only one parameter
can be selected
Set up limits and filters
Set limits on the
parameter you
selected
If a pcolor plot,
can set altitude
limits
Data can
also be
filtered
using
another
parameter
Pcolor plot output
Single request
generates
Millstone and
Eiscat plots
Plot are requested
from each site
simultaneously
to improve
performance
Rules of
road for
each site
shown
Pcolor plot output – part 2
Can add more
stacked plots with
different parameters,
or start over
Adding additional plots
Now add a scatter
plot of DST with
same time scale
Adding additional plots – part 2
Time scales
align with stacked
plots if times not
changed
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
Remote Access to Madrigal Data
 Built on web services
 Like the web, available from anywhere on
any platform
 Complete Matlab and Python API written
 More APIs available on request or via
contribution
Madrigal Web Services
 Simple delimited output via CGI scripts
 Not based on SOAP or XmlRpc since no
support in languages such as Matlab
 CGI arguments and output fully documented
at
http://www.haystack.edu/madrigal/remoteAP
Is.html
Madrigal Web Services – part 2
 To write a new API, each method must
– Take input arguments and generate the correct
CGI URL
– Parse the delimited text
– Return data to user
Matlab Remote API
 Methods
– getInstrumentsWeb
– getExperimentsWeb
– getExperimentFilesWeb
– getParametersWeb
– isprintWeb
– madCalculatorWeb
 Methods match Madrigal model
Simple Matlab example
filename = '/usr/local/madroot/experiments
/2003/tro/05jun03/NCAR_2003-06-05_tau2pl_60_uhf.bin';
eiscat_cgi_url = 'http://www.eiscat.se/madrigal/cgi-bin/';
% download the following parameters from the above file: ut, gdalt, ti
parms = 'ut,gdalt,ti';
filterStr = 'filter=gdalt,200,600 filter=ti,0,5000';
Matlab
Madrigal
API call
% returns a three dimensional array of double with the dimensions:
%
% [Number of rows, number of parameters requested, number of records]
%
% If error or no data returned, will return error explanation string instead.
data = isprintWeb(eiscat_cgi_url, filename, parms, filterStr);
Simple Matlab example,
continued
 In real code, higher level methods to search
for filename
 Entire web could be built via remote calls
 See
http://madrigal.haystack.edu/madrigal/remot
eMatlabAPI.html for complete
documentation and more examples
Simple Python example
# create the main object to get all needed info from Madrigal
madrigalUrl = ‘http://www.haystack.mit.edu/madrigal’
testData = madrigalWeb.madrigalWeb.MadrigalData(madrigalUrl)
# get all MLH experiments in 1998
expList = testData.getExperiments(30, 1998,1,1,0,0,0,1998,12,31,23,59,59)
for exp in expList:
# print out all experiments
print exp
# print list of all files in first experiment
fileList = testData.getExperimentFiles(expList[0].id)
for thisfile in fileList:
print thisfile
Python Remote API




Similar methods to Matlab
Fully documented with examples
Used to implement plotting across multiple sites
Used by SuperDarn to constantly poll for real-time
Millstone Hill data
 See
http://madrigal.haystack.edu/madrigal/remotePyth
onAPI.html for documentation and more examples
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
Extending/contributing to
Madrigal




Madrigal is completely open source
See www.openmadrigal.org for CVS
All new code is C/Python, with some Tcl.
Extending the Madrigal derivation engine is
simple
Extending the Madrigal derivation
engine
 Simply a list of methods with input Madrigal
parameters and output Madrigal parameters
– int methodName(int inCount, double * inputArr, int
outCount, double * outputArr, FILE * errFile)
 Register parameters in list
 Details at
http://madrigal.haystack.edu/madrigal/exten
dingMaddata.html
Example – Tsyganenko parameters
/***********************************************************************
* getTsygan derives field line crossing points using Tsyganenko model.
*
* arguments:
*
inCount (num inputs) = 5 (UT1, UT2, GDLAT, GLON, GDALT)
*
inputArr - double array holding:
*
UT1 - UT at record start
*
UT2 - UT at record end
*
GDLAT - geodetic latitude
*
GLON - geodetic longitude
*
GDALT - geodetic altitude
*
outCount (num outputs) = 4
*
outputArr - double array holding:
*
TSYG_EQ_XGSM - X GSM value where field line crosses GSM XY plane
*
TSYG_EQ_YGSM - Y GSM value where field line crosses GSM XY plane
*
TSYG_EQ_XGSE - X GSE value where field line crosses GSE XY plane
*
TSYG_EQ_YGSE - Y GSE value where field line crosses GSE XY plane
*
* Algorithm: See Geopack_2003.f, T01_01.f
* returns - 0 (successful)
*/
int getTsygan(int inCount,
double * inputArr,
int outCount,
double * outputArr,
FILE * errFile)
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
New features of Madrigal 2.4
 Plotting (as demonstrated)
 Automatic updating of all geophysical data
 Capture of user name, email, organization
– Web
– Remote API
 Simple python class to create/edit Madrigal files
 Simple scripts/API to create experiments, add
files, update metadata
Creating files with python -example
“”” create a file with two data records”””
import madrigal.metadata
import madrigal.cedar
################# sample data #################
kinst = 30 # instrument identifier of Millstone Hill ISR
modexp = 230 # id of mode of experiment
kindat = 3408 # id of kind of data processing
nrow = 5 # all data records have 5 2D rows
SYSTMP = (120.0, 122.0)
TFREQ = (4.4E8, 4.4E8)
GDALT = ((70.0, 100.0, 200.0, 300.0, 400.0),
(70.0, 100.0, 200.0, 300.0, 400.0))
GDLAT = ((42.0, 42.0, 42.0, 42.0, 42.0),
(42.0, 42.0, 42.0, 42.0, 42.0))
GLON = ((270.0, 270.0, 270.0, 270.0, 270.0),
(270.0, 270.0, 270.0, 270.0, 270.0))
TR = (('missing', 1.0, 1.0, 2.3, 3.0),
('missing', 1.0, 1.7, 2.4, 3.1))
DTR = (('missing', 'assumed', 'assumed', 0.3, 0.7),
('missing', 'assumed', 0.7, 0.4, 0.5))
Creating files with python – part
2
newFile = '/tmp/testCedar.dat'
# create a new Madrigal file
cedarObj = madrigal.cedar.MadrigalCedarFile(newFile, True)
# create all data records - each record lasts one minute
startTime = datetime.datetime(2005, 3, 19, 12, 30, 0, 0)
recTime = datetime.timedelta(0,60)
for recno in range(2):
endTime = startTime + recTime
dataRec = madrigal.cedar.MadrigalDataRecord(kinst,
kindat, startTime.year,
startTime.month, startTime.day,
startTime.hour, startTime.minute,
startTime.second, startTime.microsecond/10000,
endTime.year, endTime.month,
endTime.day, endTime.hour,
endTime.minute, endTime.second,
endTime.microsecond/10000,
('systmp', 'tfreq'), ('gdalt', 'gdlat', 'glon', 'tr', 'dtr'), nrow)
Creating files with python – part
3
# set 1d values
dataRec.set1D('systmp', SYSTMP[recno])
dataRec.set1D('tfreq', TFREQ[recno])
# set 2d values
for n in range(nrow):
dataRec.set2D('gdalt', n,
GDALT[recno][n])
dataRec.set2D('gdlat', n,
GDLAT[recno][n])
dataRec.set2D('glon', n, GLON[recno][n])
dataRec.set2D('tr', n, TR[recno][n])
dataRec.set2D('dtr', n, DTR[recno][n])
# append new data record
cedarObj.append(dataRec)
startTime += recTime
# write new file
cedarObj.write()
Editing files with python
“”” increases all values of Ti by 20%”””
import madrigal.metadata
import madrigal.cedar
orgFile = ‘/opt/madrigal/experiments/1998/mlh/20jan98/mil980120g.003'
newFile = '/tmp/mil980120g.003'
# read the Madrigal file into memory
cedarObj = madrigal.cedar.MadrigalCedarFile(orgFile)
# loop through each record, increasing all Ti values by a factor of 1.2
for record in cedarObj:
# skip header and catalog records
if record.getType() == 'data':
# loop through each 2D roow
for row in range(record.getNrow()):
presentTi = record.get2D('Ti', row)
# make sure its not a special string value, eg 'missing'
if type(presentTi) != types.StringType:
record.set2D('Ti', row, presentTi*1.2)
# write edited file
cedarObj.write('Madrigal', newFile)
Python File creation/editing summary
 Creates, edits catalog, header, data records
 Hides details of Cedar file formats
– Various flavors of file format
– Use of 16 bit integers to store data
– Use of “additional increment” parameters
 See
http://madrigal.haystack.edu/madrigal/pythonCeda
rTutorial.html for complete documentation
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
Cedar Database
 Outgrowth of the Madrigal Database
 A central repository
– Data persistence
– Wider variety of data






Has model result/tools
Wider variety of output formats
Data not as actively updated
Does not (yet) derive parameters
Does not separate data by experiment
See
http://cedarweb.hao.ucar.edu/documents/dbexamples.html
Cedar – simple example
Click on Data
Services
Click on Get/Plot
Data
Cedar – select instrument
Select instrument
Cedar instrument – part 2
Select instrument
Cedar date – part 1
In the next
three pages
you are
selecting a
starting day.
UI is designed
to ensure that
only a date with
data can be
selected.
Select year
Cedar date – part 2
Select month
Cedar date – part 3
Select starting
day
Select number
of days to view
Cedar output format
Choose output
format
Data filtering
available
(optional)
Cedar TAB output
TAB format
By default, shows
all measured
parameters
Cedar Database – for more info
 More complex examples at
http://cedarweb.hao.ucar.edu/documents/db
examples.html
 Contacts:
– Barbara Emery ([email protected])
– Jose Garcia ([email protected])
Outline
 Virtual Observatories
 Madrigal
– Web interface
– Remote API
– Extending/Contributing
– Madrigal 2.4
 Cedar Database
 Other data sources
Arecibo database
 Simple interface focused on their data
 http://www.naic.edu/aisr/database/html/fram
edoc.html
 Site-specific
 Easy to use
Virtual Solar Observatory
UI allows filtering
of data. Based
on uniform data
model.
Virtual Space Physics
Observatory
Based on SPASE
data model
Development slowed
by budget issues
National Geophysical Data Center
(NOAA) – Solar Terrestrial Physics
SPIDR provides
a Virtual-Observatory
like interface to
many of the datasets
See SPIDR tutorial at
http://spidr.ngdc.
noaa.gov/spidr/
tutorial.do
Many other data sources
 World Data Center
– http://www.ngdc.noaa.gov/wdc/wdcmain.html
 Canadian Space Science Data Portal
– http://www.ssdp.ca/
 NASA,ESA satellite sites
 Magnetometer arrays
– http://www-ssc.igpp.ucla.edu/gem/worldmag/index.html
 NASA's Space Physics Data Facility
– http://spdf.gsfc.nasa.gov/
 And many more…
Summary
 Virtual Observatory concept beginning to influence
data gathering
 Future success may depend on standardization
 Submit suggestions, or write improvements to
Madrigal
– www.openmadrigal.org
– [email protected]