Introduction to eScience and Semantic Web Professor Deborah McGuinness TA – Weijing Chen Other lectures from Professor Joanne Luciano, grad student Jim McCusker, and possibly.
Download
Report
Transcript Introduction to eScience and Semantic Web Professor Deborah McGuinness TA – Weijing Chen Other lectures from Professor Joanne Luciano, grad student Jim McCusker, and possibly.
Introduction to eScience and
Semantic Web
Professor Deborah McGuinness
TA – Weijing Chen
Other lectures from Professor Joanne Luciano, grad student Jim McCusker,
and possibly others from http://tw.rpi.edu/web/People
CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927
ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928
Week 1, initially August 29, 2011
Moved because of Hurricane Irene to Wednesday August 31, 2011
1
Admin info (keep/ print this slide)
• Class:
– CSCI 6962 - 01, 86933CSCI 4969 - 01, 87927
– ITWS 6960 - 01, 87198, ITWS 4969 - 01, 87928
• Hours: 1pm-3:50pm Mondays (except after
Columbus day)
• Class Location: Winslow 1140
• Instructors: Deborah McGuinness, TA Weijing Chen,
Guests: Joanne Luciano, Jim McCusker
• Contacts: [email protected], [email protected],
[email protected], [email protected]
• Contact locations: Winslow 2104 (DLM), 2143 (JSL)
2
For each class
• Titanpad – this week
http://twc.titanpad.com/147
• Scribe for each class – this week Weijing
• After class – scribe copies notes over to the
class page
• Class Page:
http://tw.rpi.edu/web/Courses/SemanticeScience/2011
• You will need an account on our site so that you can upload
your homeworks and presentations – contact Patrick West –
who is in class
• See http://tw.rpi.edu/web/Help/UploadLinkToMedia for
uploading instructions
3
Quick hints (from patrick)
• It's just a matter of adding a tag to the body of
the drupal page: <document
href="SemanteScience2011Assignment00.pdf"
alt="Semantic eScience 2011 Assignment 00"/>
• When you save the page, next to the title, you'll
see an Upload link. Click on that, upload the
document, and when you click "Upload" the
page will be changed from an Upload link to a
Download link.
• To upload a new version of the document go to
4
http://tw.rpi.edu/media/submit.php
Introductions
•
•
•
•
•
Who are we?
Who are you?
Why are you here?
What do you want to get out of the class?
Will you make the class (on time) each week
and do you have any other conflicts or issues
we should know about?
5
“Knowledge is the common
wealth of humanity”*
In the Earth and space sciences and elsewhere,
ready and open access to the vast and growing
collections of cross-disciplinary digital information
is the key to understanding and responding to
complex Earth system phenomena that influence
human survival.
We have a shared responsibility to create and
implement strategies to realise the full potential of
digital information and services for present and
future generations.
*Adama Samassekou, Convener of the UN World Summit on the Information Society
Brain Storming
• What do you think we need to address to
start to realize the vision on the previous
viewgraph?
7
Contents
•
•
•
•
•
•
•
•
•
Outline of the course
Background
e-Science
Examples
Informatics
Semantics
Elements of Semantic e-Science (SeS)
What we expect
Logistics summary
8
Outline of the course
• Topics for Semantic e-Science/ Foundations:
–
–
–
–
–
–
–
–
–
–
–
–
–
Semantic Methodologies
Knowledge Representation for e-Science
Ontology Engineering and Re-Use for e-Science
Knowledge Integration for e-Science
Semantic Data Integration
Semantic Web Languages, Tools and Services
Semantic Infrastructure and Architecture for e-Science
Semantic Grid Middleware
Ontology Evolution for e-Science
Knowledge Management for e-Science
e-Science Workflow Management
Data life-cycle for e-Science
Data Mining and Knowledge Discovery
9
Background
People (scientists) should be able to access a global,
distributed knowledge base of (scientific) data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means, using
various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with
inconsistent (or non-existent) meta-data. It may be
inconsistent, incomplete, evolving, and distributed
And… there often exists significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
10
implementation technology…
What do we need to achieve Semantic eScience?
(in-class brainstorming exercise (2010))
organization, leadership, management strategies, roles and
assignment of roles
dissemination strategy
communication of ideas
- machine level
- human level
conflict resolution
cross-disciplinary
collaboration
flexible
adaptable, feedback
extensible
ability to filter information
usage/application of resources, optimization
facts, knowledge (domain knowledge)
context, domain, scope
goals, use cases
metadata - data to describe data
ability to link information
ability to understand information
ability to capture and represent conflicting ideas
provenance - where data come from
trust - reliable
ability to capture intent (humanitarian aspect / responsibility)
credibility of information
interesting and appealing
standardization
education and outreach
methods and metrics
criteria for evaluation
The Information Era: Interoperability
Modern information and communications
technologies are creating an
“interoperable” information era in which
ready access to data and information can
be truly universal. Open access to data
and services enables us to meet the new
challenges of understand the Earth and
its space environment as a complex
system:
• managing and accessing large data sets
• higher space/time resolution capabilities
• rapid response requirements
• data assimilation into models
• crossing disciplinary boundaries.
12
Information
Information
But
data has
products have
Lots of Audiences
More Strategic
Less Strategic
SCIENTISTS TOO
13
From “Why EPO (Education and Public Outreach)?”, a NASA internal
report on science education, 2005
Shifting the Burden from the User
to the Provider
14
Fox CI and X-informatics - CSIG 2008, Aug 11
e-Science
• Emphasis is on Science
• Original narrative: One of the key drivers behind the search for such new
scientific tools is the imminent deluge of data from new generations of
scientific experiments and surveys (*). In order to exploit and explore the
petabytes of scientific data that will arise from these high-throughput
experiments, supercomputer simulations, sensor networks, and satellite
surveys, scientists will need assistance from specialized search engines,
data mining tools, and data visualization tools that make it easy to ask
questions and understand answers. To create such tools, the data will
need to be annotated with relevant "metadata" giving information as to
provenance, content, conditions, and so on; and, in many instances, the
sheer volume of data will dictate that this process be automated.
Scientists will create vast distributed digital repositories of scientific data
requiring management services similar to those of more conventional
digital libraries, as well as other data-specific services. The ability to
search, access, move, manipulate, and mine such data will be a central
requirement for this new generation of collaborative science software
applications. Hey and Trefethen, 2005
15
Evolving Science
• Thousand years ago:
science was empirical
describing natural phenomena
• Last few hundred years:
theoretical branch
using models, generalizations
• Last few decades:
a computational branch
simulating complex phenomena
• Today:
data exploration (eScience)
synthesizing theory, experiment and
computation with advanced data
management and statistics
new algorithms!
2
.
4G
c2
a
a 3 a 2
Living in an Exponential World
1000
• Scientific data doubles every year
– caused by successive generations
of inexpensive sensors +
exponentially faster computing
•
•
•
•
100
10
1
0.1
1970
Changes the nature of scientific computing
Cuts across disciplines (eScience)
It becomes increasingly harder to extract knowledge
20% of the world’s servers go into huge data centers
by the “Big 5”
– Google, Microsoft, Yahoo, Amazon, eBay
• So it is not only the scientific data!
1975
1980
1985
1990
1995
2000
CCDs
Glass
Collecting Data
• Very extended distribution of data sets:
data on all scales!
• Most datasets are small, and manually
maintained (Excel spreadsheets)
• Total amount of data dominated by the other
end
(large multi-TB archive facilities)
• Most bytes today are collected via electronic
sensors
Making Discoveries
• Where are discoveries made?
– At the edges and boundaries
– Going deeper, collecting more data, using more colors….
• Metcalfe’s law
– Utility of computer networks grows as the
number of possible connections: O(N2)
• Federating data (the connections!!)
– Federation of N archives has utility O(N2)
– Possibilities for new discoveries grow as O(N2)
• Many examples
– Sky surveys – galaxy zoo… Very early discoveries from SDSS, 2MASS, DPOS
– Genomics+proteomics
– Alzheimers article in reading
Data Delivery: Hitting a Wall
FTP and GREP are not adequate
•
•
•
•
You can GREP 1 MB in a second
You can GREP 1 GB in a minute
You can GREP 1 TB in 2 days
You can GREP 1 PB in 3 years
•
Oh!, and 1PB ~4,000 disks
•
•
•
•
You can FTP 1 MB in 1 sec
You can FTP 1 GB / min (~1 $/GB)
…
2 days and 1K$
…
3 years and 1M$
• At some point you need
indices to limit search
parallel data search and analysis
• This is where databases can help
• Take the analysis to the data!!
Mind the Gap!
• As a result of finding out who is doing what,
Informatics - information science includes the
sharing experience/ expertise, and
science of (data and) information, the practice
substantial coordination:
of information processing, and the engineering
• There
is/ was still
a gap between
science
of information
systems.
Informatics
studies the
and
the underlying
and of natural
structure,
behavior,infrastructure
and interactions
technology
that
is available
and artificial
systems
that store, process and
communicate (data and) information. It also
develops its own conceptual
theoretical
• Cyberinfrastructure
is the new and
research
foundations. Since
computers,
individuals
environment(s)
that support
advanced
data and
acquisition,
dataallstorage,
management,
organizations
processdata
information,
data
integration,
mining, data
informatics
has data
computational,
cognitive and
visualization and other computing and
social aspects, including study of the social
information processing services over the
impact of information technologies. Wikipedia.
Internet.
21
World-Wide Emerging Technology
Trends
• Innovation will come from other parts of the world
other than the U.S.
• The Chinese have skipped the Internet first
generation.
• Growth will occur in Asia, and continue to
decrease in Western Europe.
• U.S. Industry is compulsively outsourcing abroad.
• Software is moving from forms-based applications
to business processes.
• Networks are migrating to IP and optical
networking technologies.
Cyberinfrastructure
•
•
•
•
•
•
•
•
•
Data curation and storage
Federated access
Collaboration
New uses in High Performance Computing
Databases
Web servers, services (software as service)
Wiki
Visualization
All discipline neutral
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, etc.
Adopt
Leverage
Rapid
Technology Technology Science/Expert
Open World: Prototype
Infrastructure Approach Review & Iteration
Evolve, Iterate,
Redesign,
Redeploy
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
24
SemantEco
• Water Quality Portal Example from 2010
• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
25
Ex. 1: Virtual Observatories
Make data and tools quickly and easily accessible
to a wide audience.
Operationally, virtual observatories need to find the
right balance of data/model holdings, portals and
client software that researchers can use without
effort or interference as if all the materials were
available on his/her local computer using the
user’s preferred language: i.e. appear to be
local and integrated
Likely to provide controlled vocabularies that may
be used for interoperation in appropriate
domains along with database interfaces for
access and storage -> thus part IT, part CI, part
Informatics and all about doing new science
26
Added value
Education, clearinghouses,
disciplines, et c.
other
services,
Semantic mediation layer - midupper-level
VO
Portal
Semantic
interoperability
Added value
Added value
Semantic query,
hypothesis and
inference
Web
Serv.
VO
API
Mediation Layer
• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
Semantic mediation layer - VSTO - low level
associated classes, properties) and Service
Classes
• Maps queries to underlying data Metadata, schema,
data
• Generates access requests for metadata,
data
• Allows queries, reasoning, analysis, new
Added value
DBn
DB2
DB3 explanation,
hypothesis
generation,
testing,
et
c.
…………
DB
1
Query,
access
and use
of data
27
Science and technical use cases
Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of high
geomagnetic activity.
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services
via a SOAP web for the Virtual IonosphereThermosphere-Mesosphere Observatory that retrieve
data, filtered by constraints on Instrument, Date-Time,
and Parameter in any order and with constraints
28
included in any combination.
Inferred plot type
and return required
axes data
29
Semantic Web Benefits
• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time
• Decreased input requirements for query: in one case reducing the
number of selections from eight to three
• Generates only syntactically correct queries: which was not always
insurable in previous implementations without semantics
• Semantic query support: by using background ontologies and a
reasoner, our application has the opportunity to only expose coherent
query (portal and services)
• Semantic integration: in the past users had to remember (and maintain
codes) to account for numerous different ways to combine and plot the
data whereas now semantic mediation provides the level of sensible data
integration required, and exposed as smart web services
– understanding of coordinate systems, relationships, data synthesis,
transformations, etc.
– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students, professional
research associates and those from outside the fields)
30
But data has Lots of Audiences
More Strategic
Less Strategic
From “Why EPO?”, a NASA internal
report on science education, 2005
31
What is a Non-Specialist Use Case?
Teacher accesses internet goes
to An Educational Virtual
Observatory and enters a
search for “Aurora”.
Someone
should be able
to query a
virtual
observatory
without having
specialist
knowledge
32
What should the User Receive?
Teacher receives four groupings of search
results:
1) Educational materials:
http://www.meted.ucar.edu/topics_spacewx.ph
p and http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via research VOs
but the search for brightness, or green/red line
emission is mediated for them
3) Did you know?: Aurora is a phenomena of
the upper terrestrial atmosphere (ionosphere)
also known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, etc.
33
Semantic Information Integration:
Concept map for educational use of
science data in a lesson plan
34
Fox CI and X-informatics - CSIG 2008, Aug 11
35
Fox CI and X-informatics - CSIG 2008, Aug 11
Ex 2 – SemantEco /
SemantAqua
• Water Quality Portal Example from 2010
• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
• Came from hw assignment, proposed in class
• Generated papers in
– Environmental Information Management 2011
– Intl Semantic Web Conference 2011 (main
conference and possibly poster session as well)
– American Geophysical Union 2011
– Plus invited presentations for water, health, etc.
36
Semantic Web Basics
• The triple: {subject-predicate-object}
Interferometer is-a optical instrument
Optical instrument has focal length
An ontology is a representation of this knowledge
• W3C is the primary (but not sole) governing organization for
languages, specifications, best practices, et c.
– RDF - Resource Description Framework
– OWL 1.0 - Ontology Web Language (OWL 2.0 on the way)
• Encode the knowledge in triples, in a triple-store, software is
built to traverse the semantic network, it can be queried or
reasoned upon
• Put semantics between/ in your interfaces, i.e. between layers
and components in your architecture, i.e. between ‘users’ and
‘information’ to mediate the exchange
37
•
•
•
•
•
Terminology
Semantic Web
– An extension of the current web in which information is given well-defined
meaning, better enabling computers and people to work in cooperation,
www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/
Semantic Grid
– Semantic services to use the resources of many computers connected by a
network to solve large scale computational/ data problems
Provenance
– origin or source from which something comes, intention for use, who/what
generated for, manner of manufacture, history of subsequent owners, sense
of place and time of manufacture, production or discovery, documented in
detail sufficient to allow reproducibility.
Service-oriented architecture
– Provision of a capability over the internet via a ‘remote-procedure-call’ using
prescribed input, output and pre-conditions
Ontology (n.d.). The Free On-line Dictionary of Computing.
http://dictionary.reference.com/browse/ontology
– An explicit formal specification of how to represent the objects, concepts and
other entities that are assumed to exist in some area of interest and the
38
relationships that hold among them.
•
•
•
Terminology
Closed World - where complete knowledge is known (encoded), AI relied on this
Open World - where knowledge is incomplete/ evolving, SW promotes this
Languages
–
–
–
–
–
–
–
•
OWL - Web Ontology Language (W3C)
RDF - Resource Description Framework (W3C)
OWL-S/SWSL - Web Services (W3C)
WSMO/WSML - Web Services (EC/W3C)
SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format
PML - Proof Markup Language
Editors: Protégé, SWOOP, Medius, SWeDE, …
Reasoners
– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2, MSPASS, QuOnto
•
Query Languages
– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
•
Other Tools for Semantic Web
–
–
–
–
•
Search: SWOOGLE swoogle.umbc.edu
Collaboration: www.planetont.org
Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI
Semantic wiki: OntoWiki, SemanticMediaWiki
Emerging Semantic Standards for Earth Science
– SWEET, VSTO, MMI, GeoSciML
39
Semantic Web Layers
40
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Application Areas for Semantics
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Smart search
Annotation (even simple forms), smart tagging
Geospatial
Implementing logic (rules), e.g. in workflows
Data integration
Verification …. and the list goes on
Web services
Web content mining with natural language parsing
User interface development (portals)
Semantic desktop
Wikis - OntoWiki, SemanticMediaWiki
Sensor Web
Software engineering
Explanation
41
Visibility
2007-2008 Hype Cycle for Emerging
Semantic Web Technologies v0.6
Semantic
Web
Services
Triple stores, e.g.
Jena, Sesame,
Mulgara, Oracle
Spatial
Semantic
Wiki
Smart search,
e.g. NOESIS
Rules/Logic,
SWRL
Query Lang,
SPARQL
Ontology editor,
SWOOP
Concept
map, Cmap
RDF
OWL 1.0
Tagging /
annotation
Mid-level ES
domain
ontologies, e.g
GEON
Protégé
XML
Estimated years to
mainstream adoption
in Earth science
< 2 years
DL Reasoners,
2-5 years
SKOS,
e.g.
Pellet,
Racer
Species
Query
5-10 years
FOAF
Validators
Lang,
Upper level
Mid-level ES
OWL 1.1
OWL-QL
> 10 years
ontologies,
e.g
domain
ontologies,
Natural Language
Obsolete
ABC, DOLCE,
e.g SWEET
before
Ontologies
SUMO
plateau
Query Lang, Commercial
Managing
and embedded QL
modular
42
Slope of
Plateau of
ontologies Technology
Peak of
Trough of
Enlightenment
Productivity
(ES and
trigger
Inflated
Disillusionment
general)
Expectations
Produced for NASA TIWG semantic web subgroup
Time
April 2008
Outcome
Increased
Collaboration &
Interdisciplinary Science
Acceleration of
Knowledge
Production
Revolutionizing
how science is
done
Output
Geospatial
semantic services
established
Geospatial semantic
services proliferate
Scientific
semantic assisted
services
Autonomous
inference of
science results
Vocabulary
Interoperable
Information
Infrastructure
Assisted
Discovery &
Mediation
Improved
Information
Sharing
Languages/
Reasoning
Technology
Capability
Results
Semantic Web Roadmap
Some common
vocabulary based
product search
and access
Semantic
geospatial search &
inference, access
Semantic agentbased searches
Semantic agentbased integration
Local
processing + data
exchange
Basic data
tailoring services
(data as service),
verification/
validation
Interoperable
geospatial services
(analysis as
service), results
explanation service
Metadata-driven
data fusion
(semantic service
chaining), trust
SWEET core
1.0 based on
GCMD/CF
SWEET core 2.0
based on best
practices decided from
community
RDF, OWL,
OWL-S
Geospatial
reasoning, OWL-Time
SWEET 3.0 with
semantic callable
interfaces via standard
programming languages
Numerical
reasoning
Reasoners
able to utilize
SWEET 4.0
Scientific
reasoning
43
Current
Near Term (0-2 yrs)
Mid Term (2-5 yrs)
Long Term (5+ yrs)
Interactive Interoperable Responsive Verifiable
Assisted
Assisted
Data
Information Information Information Knowledge Discovery &
Analysis
services
Delivery
Quality
Building
Mediation
Seamless
Data
Access
Capability
Semantic Web Roadmap (capability)
April 2008
Some common
vocabulary based product
search and access
Some metadata
and limited
provenance
available
Semantic geospatial
search & inference,
access
Semantic agentbased searches
Semantic agent-based
integration
Common
Ontologies for data
terminology captured
mining, visualization and
in ontologies, crossing
analysis emerging/ maturing
domains
Ontologies for
information quality
developed
Verification is manual
with minimal tool
support
Domain and range
properties in ontologies
used in tools
Provenance/
annotation with
ontologies in user
tools
Service
ontologies carry
quality provenance
Services annotated
Dynamic service
Semantic markup of
Services must be
with resource
discovery and mediation, data latency (time lags)
hardwired and service
descriptions
and data scheduling
which adapt dynamically
agreements established
Local processing +
data exchange
Limited metadata
passed to analysis
applications
Basic data tailoring Interoperable geospatial
services
services (data as
(analysis as service),
service), verification/
results explanation service
validation
Tag properties, nonjargon vocabulary for
non-specialist use
Access mediated by
agreed standard
vocabularies, hard-wired
connections
Current
Access mediated
by common
ontologies
Near Term (0-2 yrs)
Shared terminology for
the visual properties of
interface objects and graph
types...
Mediation aided by
services with domain/
range properties
Mid Term (2-5 yrs)
Metadata-driven
data fusion (semantic
service chaining),
trust
Semantic fields
to describe tag key
modal functions.
Key data access
services are
semantically mediated 44
Long Term (5+ yrs)
Interactive Interoperable Responsive Verifiable
Assisted
Assisted
Data
Information Information Information Knowledge Discovery &
Analysis
services
Delivery
Quality
Building
Mediation
Seamless
Data
Access
Capability
Roadmap - from near-term to mid-term
Semantic geospatial
search & inference,
access
Ontologies for data
mining, visualization and
analysis emerging/ maturing
Ontologies for
information quality
developed
Services annotated
with resource
descriptions
Basic data tailoring
services (data as
service), verification/
validation
Tag properties, nonjargon vocabulary for
non-specialist use
Access mediated
by common
ontologies
Near Term (0-2 yrs)
-> requires agent development
and vocabulary for agent
characterization
Semantic agentbased searches
-> requires mature (domain and
data-type) ontologies with
community endorsement and
governance and a robust
integration framework
-> requires mature quality and
uncertainty ontologies with
domain and range properties
added and populated
Common
terminology captured
in ontologies, crossing
domains
-> requires semantic service
(ontology) registry
-> requires service to
implement v/v, new
descriptions of analyses,
developing explanation
-> requires development of
portal modal function
vocabulary and ontology, link
to domain context and data
structure
-> requires adding properties
to classes in ontologies and
populating instances with
expert agreement
Domain and range
properties in ontologies
used in tools
Dynamic service
discovery and mediation,
and data scheduling
Interoperable geospatial
services
(analysis as service),
results explanation service
Shared terminology for
the visual properties of
interface objects and graph
types...
Mediation aided by
services with domain/
range properties
Mid Term (2-5 yrs)
45
Selected Technical Benefits
1.
2.
3.
4.
5.
6.
7.
8.
Integrating Multiple Data Sources
Semantic Drill Down / Focused Perusal
Statements about Statements
Inference
Translation
Smart (Focused) Search
Smarter Search … Configuration
Proof and Trust
Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National
Security. May, 2005. http://www.schafertmd.com/swans/agenda.html
46
1: Integrating Multiple Data
Sources
• The Semantic Web lets us merge
statements from different sources
• The RDF Graph Model allows
programs to use data uniformly
regardless of the source
• Figuring out where to find such
data is a motivator for Semantic
Web Services
hasCoordinates
#Ionosphere
#magnetic
name
hasLowerBoundaryValue
“100”
“Terrestrial
Ionosphere”
hasLowerBoundaryUnit
“km”
Different line & text colors
47
represent different data sources
2: Drill Down /Focused
Perusal
• The Semantic Web uses Uniform
Resource Identifiers (URIs) to
…#NeutralTemperature
name things
• These can typically be resolved
to get more information about the
resource
measuredby
• This essentially creates a web of
data analogous to the web of text
created by the World Wide Web
Internet
• Ontologies are represented using
the same structure as content
– We can resolve class and
property URIs to learn about the
ontology
…#Norway
locatedIn
...#ISR
...#FPI
type
operatedby
...#MilllstoneHill …#EISCAT
48
3: Statements about Statements
• The Semantic Web allows us to
make statements about
statements
– Timestamps
– Provenance / Lineage
– Authoritativeness / Probability /
Uncertainty
– Security classification
– …
#Danny’s
#Aurora
hasSource
hasDateTime
hascolor
• This is an unsung virtue of the
Semantic Web
20031031
Red
Ontologies Workshop, APL May 26, 2006
49
4: Inference
• The formal foundations of
the Semantic Web allow
us to infer additional
(implicit) statements that
are not explicitly made
• Unambiguous semantics
allow question answerers
to infer that objects are
the same, objects are
related, objects have
certain restrictions, …
• SWRL allows us to make
additional inferences
beyond those provided by
the ontology
OperatesInstrument
#Millstone Hill
#Interferometer
hasInstrument
isOperatedBy
Measures
hasTypeofData
hasOperatingMo
hasMeaasuredData
#VerticalMeans
50
5: Translation
• While encouraging sharing,
the Semantic Web allows
multiple URIs to refer to the
same thing
• There are multiple levels of
mapping
–
–
–
–
Classes
Properties
Instances
Ontologies
• OWL supports equivalence
and specialization; SWRL
allows more complex
mappings
#precipitation
name
ont1:Precipitation
ont1:EduLevel
VO:Scientist
#precipitation
name
ont2:Rain
ont2:EduLevel
EduVO:K-12
51
6: Smart (Focused) Search
• The Semantic Web
associates 1 or more
classes with each
object
• We can use ontologies
to enhance search by:
–
–
–
–
Query expansion
Sense disambiguation
Type with restrictions
….
52
7: Smarter Search / Configuration
53
GEONGRID Ontology Search
and Data Integration Example
Uses emerging web standards to enable smart web
applications
Given an upper-level domain choice
•Ecology
Illustrate or list contained concepts/hierarchy
•VegetationCover, TreeRings, etc.
Retrieve some specific options from web
•Maps, tree-ring data,
•
Info: https://portal.geongrid.org:8443/gridsphere/gridsphere
54
55
56
8: Proof
• The logical foundations
hasCalibration
#Critical
of the Semantic Web
#FlatField
Dataset
allow us to construct
proofs that can be used
hasPeerReview
to improve transparency,
understanding, and trust
#Solar
Physics
• Proof and Trust are onPaper
going research areas for
the Semantic Web: e.g., “Critical Dataset has been calibrated
See PML and Inference with a flat field program that is published
In the peer reviewed literature.”
57
Web
Inference Web
Framework for explaining reasoning tasks by storing,
exchanging, combining, annotating, filtering, segmenting,
comparing, and rendering proofs and proof fragments
provided by multiple distributed reasoners.
• OWL-based Proof Markup Language (PML) specification as
an interlingua for proof interchange
• IWExplainer for generating and presenting interactive
explanations from PML proofs providing multiple dialogues
and abstraction options
• IWBrowser for displaying (distributed) PML proofs
• IWBase distributed repository of proof-related meta-data such
as inference engines/rules/languages/sources
• Integrated with theorem provers, text analyzers, web
services, …
http://iw.rpi.edu
58
Inference Web Infrastructure
(McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html )
Files/WWW
Semantic
OWL-S/BPEL
Discovery Service
(DAML/SNRC)
CWM
(NSF TAMI)
JTP
(DAML/NIMD)
SPARK
(DARPA CALO)
N3
KIF
SPARK-L
UIMA
(DTO NIMD Text Analytics
Exp Aggregation)
Proof Markup
Language (PML)
Trust
Justification
Provenance
Toolkit
IWTrust
Trust computation
IW Explainer/
Abstractor
End-user friendly
visualization
IWBrowser
Expert friendly
Visualization
IWSearch
search engine
based publishing
IWBase
provenance
registration
Framework for explaining question answering tasks by
• abstracting, storing, exchanging,
• combining, annotating, filtering, segmenting,
• comparing, and rendering proofs and proof fragments
provided by question answerers.
59
SW Questions & Answers
Users can explore extracted entities and relationships, create new
hypothesis, ask questions, browse answers and get explanations for
answers.
A question
An answer
A context for
explaining
the answer
An abstracted
explanation
60
(this graphical interface done by Batelle supported by Stanford KSL)
Summary
• Semantics are a very key ingredient for progress in
informatics and escience
• A sustained involvement of key inter-disciplinary
team members is very important -> leads to
incentives, rewards, etc. and a balance of research
and production
• This is what we will be teaching you in this class
61
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Rapid
Leverage
Open World: Prototype
Technology
Evolve, Iterate,
Infrastructure
Redesign,
Redeploy
Adopt
Technology Science/Expert
Approach Review & Iteration
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
62
Outline of the course
• Topics for Semantic e-Science/ Foundations:
–
–
–
–
–
–
–
–
–
–
–
–
–
Semantic Methodologies
Knowledge Representation for e-Science
Ontology Engineering and Re-Use for e-Science
Knowledge Integration for e-Science
Semantic Data Integration
Semantic Web Languages, Tools and Services
Semantic Infrastructure and Architecture for e-Science
Semantic Grid Middleware
Ontology Evolution for e-Science
Knowledge Management for e-Science
e-Science Workflow Management
Data life-cycle for e-Science
Data Mining and Knowledge Discovery
63
SeS Applications and Ontologies
•
•
•
•
Semantic Web for Health Care and Life Science
Semantic Web for Bio-Med-informatics
Semantic Web for System and Integrated Biology
Semantic Web for Sun, Earth, Environment and
Climate
• Semantic Web for Chemistry, Physics and
Astronomy
• Semantic Web for Engineering
• Semantic Web and Digital Libraries and Scientific
Publications
64
SeS Project options
• Configuration and Deployment of Semantic Virtual
Observatories
– Oceanography, astronomy, geology
•
•
•
•
•
Ontology Merging and Validation Test-bed
Semantic Language and Tool Use and Evaluation
Semantic eScience Implementation Evaluation
Semantic Collaboration Case Studies
Semantic Application Development and
Demonstration
65
Schedule – web page
• Reading assignments
• Assignments
– Individual
– Group
• Written assessments
• Presentation assessments
• Group assessments
66
What we expect
• Attend class, complete assignments
• Participate
• Ask questions – be honest with yourself and
others about what you do and do not know
• Work both individually and in a group
• Work constructively in group and class
sessions
67
Logistics summary
• Class - Monday 1-3:50pm
• Office hours – By Appointment along with a regular time to
be determined and tetherless night
• This weeks assignment:
– Reading - Ontologies 101*, Semantic Web, e-Science,
RDFS
– Turn in a one page description of one of your favorite
papers AND WHY from the reading list
• Next class (week 2 – September 12***** - note labor day):
– Foundations I: Methodologies, Knowledge Representation
• If you have a background that you think needs some extra
background reading, talk to us.
• Questions?
68
Extra
69
Digital natives expect services to
accommodate their preferences.
•
•
•
•
•
•
•
•
•
Information online, not “in line”
Information on-demand, free of place or time
Blended classroom and online experience
Flexible schedule for working students
Relevant and timely content
More team collaboration
More content from multiple sources
Interactive content from voice, video and data
Ability to contribute, as well as consume,
content/knowledge
• Leads to virtual access…
Progression after progression
Informatics
IT Cyber
Infrastru
cture
Cyber
Informatics
Core
Informatics
Science
Informatics,
aka
Xinformatics
Science,
Societal
Benefit
Areas
71
Summary
• The data and information challenges are (almost)
being identified as increasingly common
• Data and information science is becoming the
‘fourth’ column (along with theory, experiment
and computation)
• Informatics is playing a key role in filling the gap
between science (and the spectrum of nonexpert) use and generation and the underlying
cyberinfrastructure – evident due to the
emergence of Xinformatics (world-wide)
• Informatics is a profession and a community
activity and requires efforts in all 3 sub-areas
(science, core, cyber) and must be synergistic
72
Background
Scientists should be able to access a global, distributed
knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means, using
various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with
inconsistent (or non-existent) meta-data. It may be
inconsistent, incomplete, evolving, and distributed
And… there often exists significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
73
implementation technology…