Transcript Slide 1
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell1, Jon Blower1, Keith Haines1, Stephen Pascoe2, Phil Kershaw2, Bryan Lawrence2, Simon Woodman3, Hugo Hiden3 1. Reading e-Science Centre (ReSC) @ University of Reading 2. Centre for Environmental Data Archival (CEDA) @ British Atmospheric Data Centre 3. School of Computing Science @ University of Newcastle Outline • Background to MashMyData • Motivation • Challenges • Interoperability and project architecture • Current state of the project • The future of the project MashMyData Background • NERC-funded project under the ‘Technology proof of concept’ programme • Commenced 1st February 2010. Runs until 30th June 2011 • Aiming to present some of our later outputs at EGU 2011 • Here we introduce the project and show its current status and plans for the future. • Funded partners are Reading e-Science Centre (ReSC) and the Centre for Environmental Data Archival (CEDA) Motivation • Environmental scientists use many diverse data sources including: • in-situ measurements (e.g. ocean buoys, radiosondes) • remotely-sensed data (e.g. satellite, radar) • numerical simulations • However this results in much heterogeneity of data formats, data access methods, and thus suitable software • We want to allow scientists from different disciplines to bridge between a variety of datasets regardless of the underlying data formats etc. Technical Challenges • The MashMyData project is faced with a number of challenges in order to be successful • Much overlap between these challenges and a number of important challenges in the wider e-Science community • The solutions will potentially be widely applicable in the future • Challenges: • Dealing with data diversity • Performing calculations remotely in a way that scales • Accessing secure data, and the delegation problem • Enabling traceability and reproducibility Integrating web services and technologies • Recent discussion on the gains of re-using existing e-Science • We have identified a number of existing web services and technologies and integrated them in the MashMyData project: • Reading e-Science Centre’s ncWMS/Godiva2 Web Map Service (displaying gridded environmental data) • Centre for Environmental Data Archival’s Web Processing Service (number crunching for compute-intensive workflows) • Newcastle University’s e-Science Central software (upload, workflows, versioning) • University of Liege’s DIVA-on-web service (interpolating geospatial point data) Architecture Current project status • First important step was to add multi-dataset capability to godiva2 viewing portal. • As part of this we have added ability to view in-situ point data as well as gridded data • This paves the way for mashing up datasets (e.g. produce average or difference of existing datasets) • Security is being engineered currently, as is the Web Processing Service (required for mash-up workflows) Web Interface Web Interface • Click and drag a layer metadata box into a new position • This alters the layer stacking on the map to enable layers to be moved towards the front or back • The opacity of the layers can also be modified to reveal those underneath Web Interface Interface with e-Science Central • User can upload data via the eScience Central API. • Thereafter they can view available data sources and workflows • User can run a given workflow on data of choice and this will execute the workflow in e-Science Central • This interface with e-Science Central will be invisible to the user – they just know they can upload data, view it and run workflows • Files and workflows versioning and metadata are recorded by e-Science Central Examples of work in progress Further Work • Finish integration with CEDA’s WPS (currently works with a simple test process) • This in turn will pave the way for adding mash-up functionality to the web interface. • Finish engineering the security solution. This will allow access to secure datasets (e.g. Met Office) for certain authorised users. • Continue meetings with test case users to ensure that the system meets their needs (so far so good but relatively early days!) Thanks! [email protected] www.mashmydata.org (Not live yet, but currently links to our project page including svn on Google Code)