Transcript Slide 1

UK E-Science Initiative
and its
Application to SDO
J.L. Culhane
MSSL
SUMMARY
• The UK Astrogrid
• Dealing with SDO Data Volumes
• The PPARC E-Science AO
• HMI Data Products and Pipeline
What is the Grid?
Ian Foster, Argonne National Lab & University of Chicago
“A Grid is a system that:
•
Coordinates resources that are not subject to centralized control.
•
Uses standard, open, general-purpose protocols and interfaces.
•
Delivers nontrivial qualities of service.”
- Ian Foster, “What is the Grid? A Three Point Checklist”
Network
PC
Laptop
Mainframe
Phone / PDA
Printer
GRID
Space Missions
UK Astrogrid
• Astrogrid is one of three major world-wide projects
(along with European AVO and US-VO projects) which
aim to create an astronomical Virtual Observatory
• Astrogrid has a significant Solar Physics component
• The Virtual Observatory will be a set of co-operating and
interoperable software systems that:
– allow users to interrogate multiple data centres in a seamless
and transparent way;
– provide powerful new analysis and visualisation tools;
– give data centres a standard framework for publishing and
delivering services using their data.
How does Astrogrid work?
Web Service: “A web service is any piece of software that makes itself available over the
Internet and uses a standardized XML messaging system.”
- Ethan Cerami, “Top Ten FAQs for Web Services”, The O’Reilly Network
Web Service
User
Data Archive
Web Interface
Web Service
RESOURCES
Web Service
Data Storage
Web Service
Distributed Network
of Registries
Data Transformation
& Processing
Astrogrid Registry
Registry: “Dynamic database of metadata describing a set of
Internet-available resources. A registry is used to identify and
locate resources satisfying user-specified criteria, and to
direct more detailed information requests to the relevant
services. Robert Hanisch, STSCI
METADATA:
• Basic: ID, title, service type
• Curation: Location, contact,
publisher, creator, etc.
Registry
Database
• Metadata: Allowed methods, input
/ output variables, etc.
• Metadata Format: Wavelength,
coordinates, instrument coverage…
Registries contain information
about resources
Data Archive
Data Storage
Data Transformation
Distributed Network
of Registries
Solar Interior to Outer Atmosphere
Science goal: Connect observations of
the interior to fluctuations in the solar
atmosphere
Data Required: Helioseismology
observations connected with solar
atmosphere observations
Current difficulties: Being able to
search efficiently for solar atmospheric
events that may be responding to an
excitation source in the interior
Grid future: Ability to:
- Search easily for events e.g. flux
emergence, AR evolution, flares,
coronal mass ejections, over specific
time periods
- Extract parameters over the cycle
from the atmosphere and interior in
order to compare their evolution
Crucial for SDO to relate convection
zone observations to magnetic field
data for Photosphere and above
SDO HMI
Archiving and Processing
• SDO instruments generate raw data (~ 2 Tbyte/day)
along with derived products
• Derived products result from pipeline processing that
must keep up with the flow of incoming data
• GRID or Virtual Observatory approach could allow:
– Distributed data holding
– Distributed processing capability
• Network bandwidths and processing power at single
sites set limits:
– Available network bandwidths for users could limit data transfer
from/between multiple archives
– All data at one site implies considerable processing power
accessible by many distributed users
Distributed Archive Approach
• Multiple copies of the data desirable
• Needs a minimum of two geographically
separated sources with the advantages:
– Greater resilience in ability to supply users
– Load sharing between different providers
(network and processing)
– Avoids need for single site to provide
excessive processing power
Single Archive Approach
• Solar data normally stored in a raw form and need to be
processed before use
• Processing involves extraction and calibration of
selected observations.
• For data (e.g. helioseismology data) involving extended
time intervals, processing data at source is desirable
• Advantages that result:
– Reduced amount of information to be returned to user
– Affords the instrument teams more control over the processing
and quality of their data products
but
• Heavy loading of processors at single archive site unless
requests are for high-level lower-volume data products
Network Issues
• UK has “SuperJanet” backbone currently at 10
Gbps
• Local access points operate at 2.5 Gbps (e.g.
UCL interconnect rate to backbone)
• Europe has “Geant” backbone at 10 Gbps
covering UK, France, Germany,Sweden,
Switzerland with 2.5 Gbps local interconnects
• Transatlanic connection to Geant currently 2.5
Gbps with upgrade to 10 Gbps planned for 2004
• Discussion of “Global” 1 Tbps network by 2006??
• Geant driven in part by needs of HEP community
for LHC – hence SDO may not have a problem in
moving data between sites
PPARC E-Science AO
• Proposals due by 31st May, 2003
• Existence of first level Astrogrid infrastructure
assumed
• Proposals should:
– Be for the application of infrastructure and related
techniques to “real” data sets
– Underpin science but close connection between
projects and the science programme is essential
– Demonstrate an enabling role for eventual science
exploitation
– Ensure development of standards and deployment of
Grid infrastructure
– SDO bid is now anticipated by PPARC
HMI Data Analysis Pipeline
Enabling Code/
Algorithms
Processing
HMI Data
Heliographic
Doppler velocity
maps
Filtergrams
Doppler
Velocity
Tracked Tiles
Of Dopplergrams
Net Access/
Mirror
Data
Spherical
Harmonic
Time series
To l=1000
Mode frequencies
And splitting
Ring diagrams
Local wave
frequency shifts
Time-distance
Cross-covariance
function
Wave travel times
Egression and
Ingression maps
Wave phase
shift maps
Product
Internal rotation Ω(r,Θ)
(0<r<R)
Internal sound speed,
cs(r,Θ) (0<r<R)
Full-disk velocity, v(r,Θ,Φ),
And sound speed, cs(r,Θ,Φ),
Maps (0-30Mm)
Carrington synoptic v and cs
maps (0-30Mm)
High-resolution v and cs
maps (0-30Mm)
Deep-focus v and cs
maps (0-200Mm)
Far-side activity index
Stokes
I,V
Line-of-sight
Magnetograms
Stokes
I,Q,U,V
Full-disk 10-min
Averaged maps
Vector Magnetograms
Fast algorithm
Tracked Tiles
Vector Magnetograms
Inversion algorithm
Coronal magnetic
Field Extrapolations
Tracked full-disk
1-hour averaged
Continuum maps
Solar limb parameters
Coronal and
Solar wind models
Brightness feature
maps
Brightness Images
Continuum
Brightness
HMI SRR/SCR Presentation April 8-10
Line-of-Sight
Magnetic Field Maps
Vector Magnetic
Field Maps
Version 1.2w
HMI Science Data Analysis Plan
Science
Exploitation
HMI SRR/SCR Presentation April 8-10
HMI Data Volumes
Net Access
HMI SRR/SCR
Presentation
April 8-10
END OF TALK
What is Astrogrid?
Astrogrid is a £5 M data grid project that will link data archives, resources, and disciplines
from UK space institutions into a virtual observatory.
Resources
Data Archives
• Datasets
• Mullard Space Science Laboratory
• Processors
• Rutherford Appleton Laboratory
• Storage
• University of Cambridge
• Other virtual observatories
• University of Leicester
Disciplines
• Royal Observatory Edinburgh
• Astrophysics
• Queens University Belfast
• Solar Physics
• Jodrell Bank Observatory
• Solar Terrestrial Physics
GRID/Virtual Observatory
Within a virtual observatory:
• Not required for all datasets to be stored at a single site
• Metadata and registries allow system to handle a distributed archive.
• Different organisations or countries could host the different datasets
or different parts of the datasets (e.g. split by time).
• Complete catalogues relating to particular datasets should be held
wherever the data are held.
• Distributed data holding reduces the pressure on:
– Network connection to an archive
– Processing capabilities needed at the archive site
• Most accessed data could be selectively copied to distributed
archives e.g. EGSO, Astrogrid
• Derived data products should be held at distributed sites
• Material needed for more detailed searches should be described by
metadata in appropriate registries.
Example: Solar / Stellar Flares
Science Problem: A solar physicist studying the flare mechanism would like to
gather data on both solar and stellar flares.
Data Required: X-ray datasets: lightcurves, spectra, and redshift / blueshift
information from SOHO, Yokhoh, EXOSAT, ROSAT, XMM, Chandra, etc.
Current Issues: No stellar flare catalogue (at time of science problem writing),
datasets provided by several different archives with no common interface.
Solar Flare
Solar Flare
Catalogue #1 Catalogue #2
Yohkoh
Archive
Solar-B
Archive
Merged Solar
Flare List
User
Web Interface
XMM
Archive
Chandra
Archive
NEW:
Stellar Flare
Catalogue
HMI Data Archive
HMI Data Flow
HMI Dataflow Concept
HMI SRR/SCR Presentation April 8-10
HMI Standard Data Products
UK Astrogrid Scientific Aims
• Improve the quality, efficiency, ease, speed, and
cost-effectiveness of on-line astronomical
research
• Make comparison and integration of data from
diverse sources seamless and transparent
• Remove data analysis barriers to
interdisciplinary research
• Make science involving manipulation of large
datasets as easy and as powerful as possible.
UK Astrogrid Practical Goals
• Develop, with our IVOA partners (including European Grid of Solar
Observations/EGSO), internationally agreed standards for data,
metadata, data exchange and provenance
• Develop a software infrastructure for data services
• Establish a physical grid of resources shared by AstroGrid and key
data centres
• Construct and maintain an AstroGrid Service and Resource Registry
• Implement a working Virtual Observatory system based around key
UK databases and of real scientific use to astronomers
• Provide a user interface to that VO system
• Provide, either by construction or by adaptation, a set of science
user tools to work with that VO system
• Establish a leading position for the UK in VO work