Introduction to Protégé for Absolute Beginners University at Buffalo August 11-12, 2012 Goal and Content of Tutorial • The goal of the tutorial is.

Download Report

Transcript Introduction to Protégé for Absolute Beginners University at Buffalo August 11-12, 2012 Goal and Content of Tutorial • The goal of the tutorial is.

Introduction to Protégé for
Absolute Beginners
University at Buffalo
August 11-12, 2012
Goal and Content of Tutorial
• The goal of the tutorial is to explain how to
translate ontologies into a language that can
be processed by computers
• Three main sections by content:
– Overview of the Web Ontology Language (OWL)
– Hands-on training in Protégé, an OWL editor
– Overview of SPARQL Protocol and RDF Query
Language (SPARQL), a query language for
retrieving and modifying ontologically grounded
information
2
IS THE GOAL WORTHWHILE?
3
The Current State of Data Integration
on the Web
• Search engines return some remarkably
precise results but the precision degrades as
the topics become less standardized
4
A Query Containing Standardized
Terms…
5
…Yields Very Good Results
6
But as the Terms Become Less
Standardized…
7
…the Results Become Less Precise
8
The Current State of Data Integration
in the Enterprise
• Using more than a single software application
carries a risk of added cost to combine the
information they create.
– Databases carry very little meta-data about the
content of information they contain
– Spreadsheets most often carry less
9
In the Social Network, Hashtags
Cluster Information Into Categories
• But the ambiguities of
language reappear in the
categories
•
and the lack of rigor in
relating one category to
another is an obstacle to
machine based validation
of usage.
10
The Value Added by OWL Ontologies
to Data Integration
• Ontologies endow terms with machine
processable definitions and disambiguate
different senses of the same expression
• Ontologies place restrictions on how terms
can be related to other terms so that misuse
and inconsistencies can be detected.
11
The Ontologized Web, Enterprise and
Social Network
• What if creators of web pages, databases, and blogs
used terminology from curated ontologies to annotate
their content?
• Standardized ways of describing the structures to
represent data is accepted, why not extend that
acceptance to annotation of content?
• Expected Benefits:
–
–
–
–
The precision of search increase dramatically
Data from different sources can be merged
Gaps in information can be identified
Falsehoods and incoherent expressions can be detected
12
OVERVIEW OF RESOURCE
DESCRIPTION FRAMEWORK (RDF)
13
Resource Description Framework (RDF)
• Designed to be a language for making
assertions about resources
• A Resource* is
– an electronic document, an image, a source of
information with a consistent purpose
– not necessarily accessible via the Internet; e.g.,
human beings, corporations, and books in a library
can also be resources.
– an abstract concept such as the operators and
operands of a mathematical equation or types of a
relationship (e.g., "parent" or "employee“)
*derived from RFC 3986-Uniform Resource Identifier (URI): Generic Syntax from http://tools.ietf.org/html/rfc3986
14
Expressing Information in RDF
• Statements are always expressed in the form of a
triple:
– Subject – Predicate – Object (a.k.a. RDF Triple)
• Translating the statement “Austria’s GDP per capita is
30,500 Euros” into RDF requires breaking it into triples
Subject
Predicate
Object
Austria
has economic indicator
Austria’s GDP per capita
Austria’s GDP per capita
has value
30,500 Euros
15
Universal Resource Identifiers (URIs)
and Literals
• URIs are unique names of resources
– http://dbpedia.org/page/Austria
– http://en.wikipedia.org/wiki/Austria
• Literals
– Can be a simple raw text value
– can be annotated with a language tag as in
“Austria”@en
– can be typed with a datatype as in
“30,500Euros”^^string
16
Rules for RDF Statements
• Subject and Predicate have to be URI named
resources
• Object – can be either a URI named resource
or a literal
17
Applying the Rules
Using “dbpedia:”, “ro”, and “example:” as prefixes for:
http://dbpedia.org/page,
http://www.obofoundry.org/ro, and
http://www.myexample.com/resource respectively,
Which of the following are well-formed RDF statements?
Subject
Predicate
Object
dbpedia:Austria
ro:part_of
dbpedia:Europe
dbpedia:Austria
ro:part_of
“Europe”@en
“Europe”
ro:has_part
dbpedia:Austria
dbpedia:Austria
“is trading partner with”
dbpedia:Germany
dbpedia:Europe
ro:part_of
dbpedia:Austria
example:30500Euro
example:is_value_of
example:AustrianGDPperCapita
18
RDF Graphs
Nodes
dbpedia:
Austria
example:has_economic_indicator
example:
Austrian_GDPper
Capita
example:has_value
Edges
30,500Euros^^string>
The direction of the edges is always away from the subject
and towards the object of the statement
19
Graphing RDF
How would the following be represented in a RDF
Graph?
Subject
Predicate
Object
game1:MonopolyPlayer_1
rdf:Type
mnply:MonopolyPlayer
game1:MonopolyPlayer_1
mnply:has_role
game1:MonopolyBanker_Game1
game1:MonopolyPlayer_1
mnply:represented_by
game1:MonopolyTokenBoot_Game1
game1:MonopolyPlayer_1
mnply:competes_in
game1:MonopolyGame_Game1
20
Graphing RDF
game1:
MonopolyToken
Boot_Game1
mnply:represented_by
game1:
Monopoly
Game_Game1
mnply:competes_in
game1:
Monopoly
Player_1
rdf:type
mnply:
Monopoly
Player
mnply:has_role
game1:
Monopoly
Banker_
Game1
21
How far does RDF take us toward our
goal?
• The value of RDF lies in the use of URIs, as it
allows distinct information sources to share a
common meaning for terms
– Every occurrence of the same URI is a reference to
the same resource
• There is no inference with RDF, no way to
validate use of URIs.
22
OVERVIEW OF RDF SCHEMA (RDFS)
23
RDF Schema (RDFS)
“RDF Schema defines classes and properties that
may be used to describe classes, properties
and other resources”*
RDFS defines terms that can describe classes of
things and the relationships that hold
between these classes
*RDF Vocabulary Description Language 1.0: RDF Schema from http://www.w3.org/TR/rdf-schema/
24
The Need for RDFS
• RDF can name, but not define, resources or the
relationships that hold between them
• But what about…
Apples are a kind of fruit
Subject
Predicate
Object
dbpedia:Apple
ex:is_kind_of
dbpedia:Fruit
25
The Need for RDFS
• Machines cannot process elements of an
expression that lie outside of RDF. To a
machine our example looks like:
Apples are a kind of fruit
Subject
Predicate
Object
tuvwxyz:Abcde
ef:ij_klmn_op
tuvwxyz:Fghij
• We need language elements that enable a
machine to process relationships between
entities
26
RDFS Types
• Allows a resource to be typed as a class (i.e. a
collection of individuals)
• Allows a class to be defined as a subclass of
another class (i.e. all individuals that it
contains are contained in the other)
• Allows a property to be defined as a
subproperty of another property
27
RDFS Taxonomies
• Enables the creation of taxonomies of both
classes and properties
Class Taxonomy
Property Taxonomy
Fruit
is related to
Apple
is sibling of
Cortland Apple
is brother of
Gala Apple
is sister of
28
RDFS Vocabulary
rdfs:Resource
rdfs:Class
rdfs:Literal
rdfs:Datatype
rdfs:range
rdfs:domain
rdfs:subClassOf
rdfs:subPropertyOf
rdfs:label
rdfs:comment
rdfs:ContainerMembershipProperty
rdfs:Member
rdfs:seeAlso
rdfs:isDefinedBy
29
RDFS Vocabulary in Action
• rdfs:subClassOf is used to assert that every instance of
a class is an instance of another.
Apples are a kind of fruit
Subject
Predicate
Object
dbpedia:Apple
rdfs:subClassOf
dbpedia:Fruit
• If a resource is rdf:type dbpedia:Apple, a reasoner will
assert that the resource is also rdf:type dbpedia:Fruit
example:
Newtons
Apple
rdf:type
dbpedia:
Apple
rdfs:subClassOf
dbpedia:
Fruit
rdf:type
30
RDFS Vocabulary in Action
• rdfs:subPropertyOf is used to assert that every
pair of resources that are related by a
property are also related by another.
Every sister of a person is a sibling of that person
Subject
Predicate
Object
ex:is_sister_of
rdfs:subPropertyOf
ex:is_sibling_of
• If Ann is the sister of Ben and is sister of is a
subproperty of is sibling of, then a reasoner
will assert that Ann is a sibling of Ben
31
RDFS Vocabulary in Action
• rdfs:domain is used to assert that a property is always
applied to instances of one or more classes.
Only females can be sisters of others
Subject
Predicate
Object
ex:is_sister_of
rdfs:domain
ex:Female
• If Ann is related to Ben via the ex:is_sister_of property,
a reasoner will assert that Ann is rdf:type ex:Female
example:
Ann
example:is_
Sister_of
Example:
Ben
rdf:type
example:
Female
32
RDFS Vocabulary in Action
• rdfs:range is used to assert that the instances of the
object of a property are always of one or more classes
or datatypes
Only plants can bear fruit
Subject
Predicate
Object
ex:is_borne_by
rdfs:range
dbpedia:Plant
• If Newton’s apple is related to Newton’s apple tree via
the ex:is_borne_by property, a reasoner will assert that
Newton’s apple tree is rdf:type dbpedia:Plant
example:
Newton’s
Apple
example:is_
borne_by
Example:
Newton’s
Apple Tree
dbpedia:
Plant
rdf:type
33
RDFS Vocabulary in Action
• rdfs:label is used to provide a human readable
version of a resource’s name.
• If a GUID is used as the identifier for the class
of Apple, then use rdfs:label to assign as many
human readable versions as desired.
Subject
Predicate
Object
ex:EXO_0002032
rdfs:label
“Apple”@en
ex:EXO_0002032
rdfs:label
“Manzana”@sp
ex:EXO_0002032
rdfs:label
“Mela”@it
34
RDFS Vocabulary in Action
• rdfs:comment is used to provide a humanreadable description of a resource
Subject
Predicate
Object
dbpedia:Apple
rdfs:comment
“The apple is the pomaceous fruit of the apple
tree, species Malus domestica in the rose
family”@en
dbpedia:Apple
rdfs:comment
“La mela è il frutto (più precisamente si tratta di
un falso frutto a pomo) del melo.” @it
Both comments are reused from http://dbpedia.org/page/Apple
35
RDFS Vocabulary in Action
• rdfs:seeAlso is used to assert that a resource
provides additional information about the
subject resource.
Subject
Predicate
Object
dbpedia:Apple
rdfs:seeAlso
wiki:Apple
dbpedia:Apple
rdfs:seeAlso
ex:Apple
36
RDFS Vocabulary in Action
• rdfs:isDefinedBy is used to assert that a
resource defines the subject resource.
Subject
Predicate
Object
dbpedia:Apple
rdfs:isDefinedBy
wiktionary:apple
dbpedia:Apple
rdfs:isDefinedBy
wordnet:apple
37
How far does RDFS take us toward our
goal?
• Contains elements that enable machine
inferencing on necessary conditions (e.g. Apples
are the fruit of the apple tree)
• Doesn’t allow restrictions on classes that would
enable inferencing on sufficient conditions (e.g.
Apples are the fruit of the apple tree)
• Doesn’t provide a way to exclude resources from
class membership, can’t validate assertions.
38
OVERVIEW OF THE WEB ONTOLOGY
LANGUAGE (OWL)
39
Web Ontology Language (OWL*)
• OWL is the descendant of Knowledge
Representation Languages of the 1990’s such
as Simple HTML Ontology Extensions (SHOE)
and Ontology Inference Layer (OIL) and from
the DARPA Agent Markup Language (DAML)
• The initial version of OWL became a formal
W3C Recommendation on February 10, 2004
• OWL 2 became a W3C Standard on October
27, 2009
* why “OWL” instead of “WOL” http://lists.w3.org/Archives/Public/www-webont-wg/2001Dec/0169.html
40
The Need for OWL
• RDFS lacks the expressive power allow inferences about individuals
beyond their class membership.
Subject
Predicate
Object
dbpedia:Apple
rdf:type
rdfs:Class
dbpedia:Apple
rdfs:subClassOf
ex:FruitOfAppleTree
ex:FruitOfAppleTree
rdf:type
rdfs:Class
ex:FruitOfAppleTree
rdfs:subClassOf
dbpedia:Apple
• Based on this equivalence a machine can infer only that the two
classes have the same instances.
• We want to enable a machine to infer the attributes of an individual
based upon the definition of the class of which they are members
41
OWL Usage
“The W3C OWL 2 Web Ontology Language
(OWL) is a Semantic Web language designed
to represent rich and complex knowledge
about things, groups of things, and relations
between things. OWL is a computational
logic-based language such that knowledge
expressed in OWL can be reasoned with by
computer programs either to verify the
consistency of that knowledge or to make
implicit knowledge explicit.”*
* http://www.w3.org/TR/owl2-primer/
42
Defining Classes -Enumeration
Use owl:oneOf to enumerate the members of a
class
In Manchester Syntax
Class: MonopolyToken
EquivalentTo:
{Battleship , Boot , Car , Dog , Thimble , Top_Hat , Wheelbarrow, Iron}
SubClassOf:
Thing
43
Defining Classes - Restrictions
• owl:Restriction creates a class defined using an
object property and either:
– a value constraint which places a constraint on the
range of the property when applied to this particular
class
• e.g. the rdfs:range of the is_borne_by property might be
plant, but when defining apple we would constrain the range
to the class of apple trees
– a cardinality constraint which places a constraint on
the number of values a property can take in the
context of a particular class
• e.g. there can be no more than 8 players in a game of
Monopoly
44
Additional Inferences Gained Through
Restrictions
Without a restriction all that can be inferred about an improved property
is that it must also be a property
Class: MonopolyImprovedProperty
SubClassOf:
MonopolyProperty
Adding a restriction adds the information that an improved property must
be a property and that it must be the location of some building
Class: MonopolyImprovedProperty
EquivalentTo:
location_of some MonopolyBuilding
SubClassOf:
MonopolyProperty
45
rdfs:subClassOf vs. owl:equivalentClass
property that is the
location of a building
?
is a subclass of
?
Virginia Place is the
location of House 1
improved property
property that is the
location of a building
is an equivalent
class of
Virginia Place is the
location of House 1
improved property
46
owl:allValuesFrom vs. owl:someValuesFrom
• owl:allValuesFrom constrains the object property
so that its value must come from the specified
class or data range
– Example: A mortgaged property is one such that it is
owned only by the bank
• owl:someValuesFrom constrains the object
property so that at least one of its values must
come from the specified class or data range
– Example: An improved property is the location of
some building
47
owl:hasValue
• The owl:hasValue constraint limits an object property to a
given value, which can be either an individual or a data value.
For example we could use this constraint to assert that all
monopoly railroads have a price of 200.
Class: MonopolyRailroad
SubClassOf:
has_price value 200,
MonopolyProperty
• Given an resource that is a Monopoly Railroad a reasoner will
infer that its price is 200.
game1:
Reading
Railroad
rdf:type
mnply:has_price
mnply:
Monopoly
Railroad
rdfs:subClassOf
200
mnply:
has_price =
200
48
owl:hasValue
• To define the class of New York City building we can use
owl:hasValue on the property of located_in and the individual
NewYorkCity
Class: NewYorkCityBuilding
SubClassOf:
located_in value NewYorkCity,
Building
• Given an resource that is a New York City building a reasoner
will infer that its location is New York City.
example:
EmpireState
Building
rdf:type
example:
NewYorkCity
Building
example:located_in
rdfs:subClassOf
example:
located_in
NYC
example:NewYorkCity
49
Cardinality Constraints
• Useful in expressing that a class has an exact number of
relationships to another class or data range.
Example: A turn has exactly one player as a participant and exactly one
integer as its ordinal value
Class: MonopolyTurn
Annotations:
rdfs:label "Monopoly turn"^^xsd:string
SubClassOf:
has_ordinal_value exactly 1 xsd:integer,
has_participant exactly 1 MonopolyPlayer,
occurs_containing some MonopolyRollOfDice,
occurs_during some MonopolyRound,
MonopolyEvent
50
Cardinality Constraints
• Can also express that the number of instances of a given
relationship between a class and another class or data range can
span a range of values
Example: A color group can have between 2 and 3 properties as
members.
Class: MonopolyColorGroup
SubClassOf:
owl:Thing,
(has_member min 2 MonopolyProperty)
and (has_member max 3 MonopolyProperty)
51
Set Operators
• owl:intersectionOf - a class is formed from
the individuals that are common to two or
more classes
• owl:unionOf – a class is formed from the
individuals that are in any of two or more
classes
• owl:complementOf – a class is formed from
the individuals that are not members of a
class
52
owl:equivalentClass and owl:disjointWith
• owl:equivalentClass establishes that two
classes have the same instances
– this is similar to the owl:sameAs that establishes
that two classes have the same intention
• owl:disjointWith establishes that two classes
have no members in common
53
Defining Properties - Subtypes
• Object Property – used to link individuals to
individuals
• Datatype Property – used to link individuals to
data values
• Annotation Property – used to link ontology
elements to metadata
54
Defining Properties – Relations to
Other Properties
• owl:equivalentProperty – behaves similarly to
owl:equivalentClass, two properties are
equivalent if and only if they have the same
members (i.e. they have the same extension)
• owl:inverseOf – if x is related to y with by
property A and property A is the inverse of
property B, then y is related to x with property
B
55
Defining Properties – Cardinality
Constraints
• owl:FunctionalProperty is used to place a
uniqueness constraint on the value of the range
of a property for each value in the domain of that
property.
game1:
Monopoly
Token
Railroad_1
game1:
Monopoly
Player_1
owl:sameAs
game1:
Monopoly
Token
Railroad_2
56
Defining Properties – Cardinality
Constraints
• owl:InverseFunctionalProperty is used to place a
uniqueness constraint on the value of the domain
of a property for each value in the range of that
property
game1:
Monopoly
Player_1
game1:
Monopoly
Token
Railroad_1
owl:sameAs
game1:
Monopoly
Player_2
57
Defining Properties – Logical
Characteristics
• Symmetric Property – P is a symmetric property if
aPb then bPa
• Asymmetric Property – P is an asymmetric
property if aPb then not bPa
• Reflexive Property – P is a reflexive property
every aPa
• Irreflexive Property – P is an irreflexive property
no aPa
• Transitive Property – P is a transitive property if
aPb and bPc, then aPc
58
A Few Examples
• The relationship of being adjacent to is symmetric
– If Mediterranean Avenue is adjacent to Go, then Go is
adjacent to Mediterranean Avenue
• The relationship of having a role is asymmetric
– If Player_1 has the role of banker, then the role of banker
does not have the role of Player_1
• The relationship of occurring prior to is transitive
– If Round_1 occurs prior to Round_2 and Round_2 occurs
prior to Round_3, then Round_1 occurs prior to Round_3
59
Multi-typed Properties
Properties can be typed by more than one of the logical
characteristics
ObjectProperty: adjacent_to
Annotations:
rdfs:label "adjacent to"^^xsd:string
Characteristics:
Irreflexive,
Symmetric
Domain:
MonopolyBoardSpace
Range:
MonopolyBoardSpace
60
A COUPLE OF IMPORTANT
ASSUMPTIONS
61
No Assumption of Unique Names
• There is no assumption that if “two” resources
have unique names that they represent
distinct entities
• This holds for any type of resource: class,
property, datatype or instance
62
Open World Assumption
• Some data management systems use the Closed
World assumption, meaning that if a fact is not
found among the data in the system, it is
assumed to be false.
– In a sales database, if the name “Steve Wozniak” does
not appear in the customer table, then Mr. Wozniak is
not a customer of that company
• In Semantic Web applications, the Open World
assumption is used, meaning that if a fact is not
found among a set of data it is not assumed to be
false.
63