Data driven research in Earth and Environmental Sciences

Download Report

Transcript Data driven research in Earth and Environmental Sciences

Joining the Dots

Managing and identifying geolocated data by DOIs and IGSNs

Jens Klump | OCE Science Leader Earth Science Informatics 20 August 2014

MINERAL RESOURCES FLAGSHIP

TERENO Terrestrial Environmental Observatory

Managing data in environmental monitoring

What is TERENO?

• • • • • TERENO is an infrastructure initiative by the Helmholtz Association to provide an environmental monitoring infrastructure for the scientific community.

Construction started in 2008, operation planned for 25 years.

4 regional observatories TERENO Northeast has • 8 study sites • 32 platforms • Approx. 35 M data entries from various sensors • More platforms being added The other three regional observatories are of similar scale.

3 | Joining the dots | Jens Klump

Regions of high climate vulnerability

Regions of high vulnerability  Droughts  Heat waves  Floods  Winter storms  Loss of biodiversity  Landsides From: Rüdiger Glaser (2008)

Klimageschichte Mitteleuropas

1200 Jahre Wetter, Klima, Katastrophen 4 | Joining the dots | Jens Klump

TERENO Regional Observatories

 Northeastern German Lowland Observatory •Coordination: GFZ  Harz / Central German Lowland Observatory •Coordination: UFZ  Eifel / Lower Rhine Valley Observatory •Coordination: FZJ  Bavarian Alps / pre-Alps Observatory •Coordination: HMUG und KIT 5 | Joining the dots | Jens Klump

TERENO Research Goals

Investigate interactions and feedbacks between different compartments: Bridging the gap between measure-ment, model and management:

Atmosphere Terrestrial Biosphere Terrestrial Hydrosphere & Pedosphere

6 | Joining the dots | Jens Klump

TERENO Northeast

7 | Presentation title | Presenter name

  

Combination of geoarchives with process observations

Region impacts of Global Change on near-natural terrestrial ecosystems and landscape in space and time Integrated system analysis of climate- and landscape development/process understanding Combination of real-time process observations (e.g. soil moisture, hydrology, vegetation) and evaluation of geoarchives (lacustrine, colluvials, peats, soils)

Remote Sensing Field observation Geoarchive

8 | Presentation title | Presenter name

TERENO data management

9 | Presentation title | Presenter name

System architecture

10 | Presentation title | Presenter name

TERENO data portal

11 | Joining the dots | Jens Klump

Looking Ahead: Future Directions

Data Driven Research in the Geological Sciences

Identifiers for software

• • • Similar to data an specimens, also software should be identifiable in a persistent way.

• Establish the missing link between papers and data.

• Make software recognisable as a scientific achievement.

• Make science more transparent and reproducible.

Simply assigning DOI to software is a good start but might not be good enough.

Again, we encounter the question of identity (version) and location (repository).

www.sciforge-project.org

13 | Joining the dots | Jens Klump

Managing Data from Sensor Networks

14 | Joining the dots | Jens Klump

Working with very large data sets

• • • • Some data sets are too large to be inspected in detail, or even to be loaded on a desktop PC.

Example: How would one check three years of meteorological radar data for anomalies?

Data mining today mainly involves numerical and textual media.

Processing will have to move from the desk top to the cloud for large data sets.

15 | Joining the dots | Jens Klump

Linked Data

1. Use URIs to denote things.

2. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.

3. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL.

4. Include links to other related things (using their URIs) when publishing data on the Web.

16 | Joining the dots | Jens Klump

Summary

• • • • • • Persistent identifiers allow us to publish, cite, identify data, specimens and software.

Data publication is now becoming more common.

The principles of data identification can also be used with materials (e.g. IGSN) and software.

Future publications might consist of elements linked by identifiers: • Interpretation (“Paper”) • Data • Materials • Software and workflows More and more data repositories offer API based on linked data.

Future data “publication” will also cater both for people and user agents.

17 | Joining the dots | Jens Klump

Thank you

Mineral Resources Flagship

Jens Klump OCE Science Leader Earth Science Informatics

t

+61 8 6436 8828 e [email protected]

w www.csiro.au

MINERAL RESOURCES FLAGSHIP