TWC AGU Fall Meeting 2013, San Francisco, CA Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang.

Download Report

Transcript TWC AGU Fall Meeting 2013, San Francisco, CA Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang.

TWC
AGU Fall Meeting 2013, San Francisco, CA
Ontology Development for Provenance Tracing in
National Climate Assessment of
the US Global Change Research Program
Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c,
Linyun Fu a, Brian Duggan b,c, Patrick West a,
Jun Xu a, Chengcong Du a, Anusha Akkiraju a
Steve Aulenbach b,c, Curt Tilmes c,d, Peter Fox a
a
Tetherless World Constellation, Rensselaer Polytechnic Institute; b University
Corporation for Atmospheric Research; c U.S. Global Change Research
Program; d NASA Goddard Space Flight Center
TWC
Background
•
United States Global Change Research Program (USGCRP): An
interagency program that coordinates and integrates Federal research on
changes in the global environment and their implications for society
•
National Climate Assessment (NCA): An assessment conducted under the
auspices of the Global Change Research Act of 1990, which requires a report
to the President and the Congress every four years that evaluates, integrates
and interprets the findings of the USGCRP with the intent to advance an
inclusive and sustained process for assessing and communicating scientific
knowledge of the impacts, risks and vulnerabilities associated with a changing
global climate in support of decision making across the United States
•
Global Change Information System (GCIS): An information system under
development through the USGCRP that establishes data interfaces and
interoperable repositories of climate and global change data which can be
easily and efficiently accessed, integrated with other data sets, maintained
over time and expanded as needed into the future
From: The National Global Change Research Plan 2012 - 2021
2
TWC
Collaborators
National Science and
Technology Council (NSTC)
Committee on Environment,
Natural Resources and
Sustainability (CENRC)
White House Office
of Science and
Technology Policy
(OSTP)
Subcommittee on Global
Change Research (SGCR)
U.S. Global Change Research
Program (USGCRP)
GCIS: Information Model
and Semantic Application
Prototypes (GCIS-IMSAP)
Global Change
Information
System (GCIS)
National Climate
Assessment
(NCA)
National Climate Assessment
Development Advisory
Committee (NCADAC)
3
TWC
What we do
• Ongoing: provenance* for the NCA3** report
• Future: provenance of publications, datasets,
models, organizations, instruments, experiments,
people, etc. eventually covering the entire scope
of global change
* Provenance - Information about entities, activities, people and
organizations involved in the production of the research findings and the
supporting datasets and methods (cf. Moreau and Missier, 2013)
** NCA3 - The National Climate Assessment Development Advisory
Committee (NCADAC) engaged more than 240 authors in the creation of
the third NCA (NCA3) report, which is to be released in early 2014
4
TWC
An example
“Figure 1.2: Sea Level Rise: Past, Present, and Future” in draft NCA3
5
TWC
Remote sensing sensors, platforms, and
instruments are used in global change research
Image source: Yang et al., 2013.
Nature Climate Change
6
TWC
An example question of provenance tracing:
What
are NASA
contributions
Figure
1.2 in
theFuture”
draft NCA3?
“Figure
1.2: Sea
Level Rise: to
Past,
Present,
and
in draft NCA3
7
TWC
Ontology Development for Provenance Tracing
in the third National Climate Assessment
The third National Climate
Assessment Report (NCA3)
Provenance – Information about
entities, activities, people and
organizations involved in the
production of the research
findings and the supporting
datasets and methods
Ontology – In this work the
ontology (GCIS ontology) is a
conceptual model of classes,
properties and instances that
can be used to capture
provenance information in the
NCA3
Image courtesy of nature.com
8
TWC
Method: a use case-driven
iterative approach
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
9
TWC
Identifies:
• goals/objectives to be accomplished
• resources to be used to achieve these objectives
• methods to be used to produce the desired results
A template for documenting use cases:
http://tw.rpi.edu/media/2013/07/25/ae99/UseCase_Tem
plate_SeS.doc
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology10
TWC
A facilitator:
• sets and monitors direction
• provides guidance for scoping the use case
• milestones for implementation
Team formation: domain experts, data and information
producers, knowledge and information modelers,
software engineers, and a scribe.
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology11
TWC
In GCIS-IMSAP works we used:
• Group meeting: Titanpad, Skpye, GotoMeeting
• Conceptual modeler: CMapTools
• Ontology editor: Protege, Notepad++
• Ontology documentation: LODE, Parrot
• Evolution environmens: TopBraid
• Validator/Browser: ELDA, S2S
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology12
TWC
Provenance-explicit use cases
The first use case
• Title: Visit data center website of dataset used to generate a report
figure
• Actor and system: a reader of the draft NCA3 on the GCIS website
• Flow of interactions: A reader wishes to identify the source of the data
used to produce a particular figure in the draft NCA3. A reference to
the paper in which the image contained in this figure was originally
published appears in the figure caption. Clicking that reference
displays a page of metadata information about the paper, including
links to the datasets used in that paper. Pursuing each of those links
presents a page of metadata information about the dataset, including
a link back to the agency/data center web page describing the
dataset in more detail and making the actual data available for order
or download.
13
TWC
An intuitive concept map of the use case
14
TWC
An intuitive concept map of the use case
Classes and properties recognized from the use case
15
TWC
An intuitive concept map of the use case
From an intuitive model to an ontology:
(1) A defined class or property should be meaningful and robust
enough to meet the requirements of various use cases
(2) and
An properties
ontology can
be extended
classes and properties
Classes
recognized
fromby
theadding
use case
recognized from new use cases through the iterative approach
16
TWC
The second use case
• Title: Identify roles of people in the generation of a chapter in the draft
NCA3
• Actor and system: a viewer of the GCIS website
• Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the
draft NCA3 was written by a group of authors mentioned in a list. On
the title page of that chapter the reader can view the role of each
author, e.g., convening lead author, lead author or contributing
author, in the generation of this report chapter.
• We decided to use the PROV-O ontology to describe this use case
17
TWC
The three Starting Point classes
in PROV-O ontology and the
properties that relate them
Source: http://www.w3.org/TR/prov-o/
18
TWC
Mapping the use case
into PROV-O
Author of
Chapter 6
Chapter 6
in NCA3
isA
isA
Writing of
isA
Chapter 6
in NCA3
19
TWC
Roles of agents in an
activity in PROV-O
Source: http://www.w3.org/TR/prov-o/
20
TWC
Mapping roles of chapter
authors into PROV-O
isA Author of
Chapter 6
Writing of
Chapter 6
in NCA3
isA
Convening
lead author
Lead author
isA
Contributing
author
21
TWC
Roles of people in
the activity ‘Writing
of Chapter 6’
Here only three of
the eight authors
of this chapter are
shown. Each
author had a
specific role for
this chapter.
TWC
We used PROV-O for describing roles of agents in an activity
We can also describe roles of agents for an entity
23
TWC
Roles of people to
the entity ‘Chapter
6: Agriculture’
Here only three of
the eight authors
of this chapter are
shown. Each
author had a
specific role for
this chapter.
24
TWC
More instances of prov:Role collected in the GCIS ontology
25
TWC
Re-using existing ontologies for the GCIS ontology
By such mappings we can use reasoners that are suitable for the PROV-O
ontology, and thus to retrieve provenance graphs from the established GCIS
26
TWC
The third use case
• Title: Provenance tracing of NASA contributions to Figure 1.2 in the
draft NCA3
• Actor and system: a viewer of the GCIS website
• Flow of interactions: A viewer sees that the caption of Figure 1.2 “Sea
Level Rise: Past, Present and Future” of the draft NCA3 cites four
data sources. Selecting the third citation displays a page of
information about the cited paper and a citation to the dataset used in
that paper. Information about the dataset includes a formal
description of its origin, that is, the dataset is derived from data
produced by the TOPEX/Poseidon and Jason altimeter missions
funded by NASA and CNES. Clicking a link to each of these missions
presents a page about the platforms, instruments and sensors in that
mission.
27
TWC
“Figure 1.2: Sea Level Rise: Past, Present, and Future” in draft NCA3
28
TWC
(a) Instances of
calibration, model and
software underpinning
“paper/103”
Here only the details of one
paper (i.e., “paper/103”) cited
by that figure are shown
Here only the details of
Topex-Poseidon mission are
shown
(b) Instances of sensor,
instrument and platform
underpinning that paper
Provenance tracing of NASA contributions to Figure 1.2 in draft NCA3
29
TWC
TWC
TWC
32
TWC
Current result
• GCIS ontology version 1.1
–
–
–
–
http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology
Ontology documentation
Conceptual map
gcis ontology rpi
Ontology RDF
• We have had and will have more use cases, and
• New versions of GCIS ontologies
33
TWC
Current result:
GCIS ontology version 1.1
GCIS
ontology
version
1.1
(a) Classes and
properties
representing a
brief structure of
the draft NCA3
TWC
GCIS
ontology
version 1.1
(b) Classes and
properties related to the
findings of the draft NCA3
and each chapter in it
35
TWC
GCIS
ontology
version 1.1
(c) Classes and properties about sensors, instruments,
platforms, and algorithms, etc. that datasets are derived from
36
TWC
A few classes are asserted as
sub-classes of “prov:Entity” and
“prov:Activity”, respectively
37
TWC
Wrap up
• The use case-driven iterative method bridges the gap between
Semantic Web researchers and Earth and environmental
scientists
– It is capable of rapid deployment for Semantic Web application
developments
• First-hand experience for re-using the W3C PROV-O
ontology in the field of Earth and environmental sciences
• GCIS will enrich the GCIS ontology in its provenance
tracing capability, eventually for covering provenance
information for the entire scope of global change
• Collaboration for a PROV-ES ontology for Earth and
environmental sciences
38
TWC
Thank you!
Sponsors
gcis rpi
[email protected]