Transcript Slide 1

The MashMyData project
Combining and comparing environmental science
data on the web
Alastair Gemmell1, Jon Blower1, Keith Haines1, Stephen Pascoe2,
Phil Kershaw2, Bryan Lawrence2, Simon Woodman3, Hugo Hiden3
1. Reading e-Science Centre (ReSC) @ University of Reading
2. Centre for Environmental Data Archival (CEDA) @ British Atmospheric Data
Centre
3. School of Computing Science @ University of Newcastle
Outline
• Background to MashMyData
• Motivation
• Challenges
• Interoperability and project architecture
• Current state of the project
• The future of the project
MashMyData Background
• NERC-funded project under the ‘Technology proof of concept’
programme
• Commenced 1st February 2010. Runs until 30th June 2011
• Aiming to present some of our later outputs at EGU 2011
• Here we introduce the project and show its current status and
plans for the future.
• Funded partners are Reading e-Science Centre (ReSC) and
the Centre for Environmental Data Archival (CEDA)
Motivation
• Environmental scientists use many diverse data sources
including:
• in-situ measurements (e.g. ocean buoys, radiosondes)
• remotely-sensed data (e.g. satellite, radar)
• numerical simulations
• However this results in much heterogeneity of data formats,
data access methods, and thus suitable software
• We want to allow scientists from different disciplines to bridge
between a variety of datasets regardless of the underlying
data formats etc.
Technical Challenges
• The MashMyData project is faced with a number of
challenges in order to be successful
• Much overlap between these challenges and a number of
important challenges in the wider e-Science community
• The solutions will potentially be widely applicable in the future
• Challenges:
• Dealing with data diversity
• Performing calculations remotely in a way that scales
• Accessing secure data, and the delegation problem
• Enabling traceability and reproducibility
Integrating web services and technologies
• Recent discussion on the gains of re-using existing e-Science
• We have identified a number of existing web services and
technologies and integrated them in the MashMyData project:
• Reading e-Science Centre’s ncWMS/Godiva2 Web Map Service
(displaying gridded environmental data)
• Centre for Environmental Data Archival’s Web Processing Service
(number crunching for compute-intensive workflows)
• Newcastle University’s e-Science Central software (upload,
workflows, versioning)
• University of Liege’s DIVA-on-web service (interpolating
geospatial point data)
Architecture
Current project status
• First important step was to add multi-dataset capability to
godiva2 viewing portal.
• As part of this we have added ability to view in-situ point data
as well as gridded data
• This paves the way for mashing up datasets (e.g. produce
average or difference of existing datasets)
• Security is being engineered currently, as is the Web
Processing Service (required for mash-up workflows)
Web Interface
Web Interface
• Click and drag a layer metadata box into a new position
• This alters the layer stacking on the map to enable layers
to be moved towards the front or back
• The opacity of the layers can also be modified to reveal
those underneath
Web Interface
Interface with e-Science Central
• User can upload data via the eScience Central API.
• Thereafter they can view available
data sources and workflows
• User can run a given workflow on
data of choice and this will execute
the workflow in e-Science Central
• This interface with e-Science Central
will be invisible to the user – they just
know they can upload data, view it
and run workflows
• Files and workflows versioning and
metadata are recorded by e-Science
Central
Examples of work in progress
Further Work
• Finish integration with CEDA’s WPS (currently works with a
simple test process)
• This in turn will pave the way for adding mash-up functionality
to the web interface.
• Finish engineering the security solution. This will allow
access to secure datasets (e.g. Met Office) for certain
authorised users.
• Continue meetings with test case users to ensure that the
system meets their needs (so far so good but relatively early
days!)
Thanks!
[email protected]
www.mashmydata.org
(Not live yet, but currently links to our project page including
svn on Google Code)