Transcript Document

Language Technologies and the Semantic Web:
An Essential Relationship.
Enrico Motta
Professor of Knowledge Technologies
Knowledge Media Institute
The Open University
Content of the Talk
• Update on the Semantic Web
– Beyond the hype
• What it is
• Why it is interesting
• What’s its status?
• Semantic Web and AI
• Semantic Web Applications
– Key features
– Reasoning on the Semantic Web
– Key role of Language Technologies
• Conclusions
The Semantic Web in 2 minutes…
<foaf:Person rdf:about="http://identifiers.kmi.open.ac.uk/people/enrico-motta/">
<foaf:name>Enrico Motta</foaf:name>
<foaf:firstName>Enrico</foaf:firstName>
<foaf:surname>Motta</foaf:surname>
<foaf:phone rdf:resource="tel:+44-(0)1908-653506"/>
<foaf:homepage rdf:resource="http://kmi.open.ac.uk/people/motta/"/>
<foaf:workplaceHomepage rdf:resource="http://kmi.open.ac.uk/"/>
<foaf:depiction rdf:resource="http://kmi.open.ac.uk/img/members/enrico.jpg"/>
<foaf:topic_interest>Knowledge Technologies</foaf:topic_interest>
<foaf:topic_interest>Semantic Web</foaf:topic_interest>
<foaf:topic_interest>Ontologies</foaf:topic_interest>
<foaf:topic_interest>Problem Solving Methods</foaf:topic_interest>
<foaf:topic_interest>Knowledge Modelling</foaf:topic_interest>
<foaf:topic_interest>Knowledge Management</foaf:topic_interest>
<foaf:based_near>
<geo:Point>
<geo:lat>52.024868</geo:lat>
<geo:long>-0.707143</geo:long>
<contact:nearestAirport>
<airport:name>London Luton Airport</airport:name>
<airport:iataCode>LTN</airport:iataCode>
<airport:location>Luton, United Kingdom</airport:location>
<geo:lat>51.866666666667</geo:lat>
<geo:long>-0.36666666666667</geo:long>
<rdfs:seeAlso rdf:resource="http://www.daml.org/cgi-bin/airport?LTN"/>
<foaf:currentProject>
<foaf:Project>
<foaf:name>AquaLog</foaf:name>
The foaf ontology
The SW as ‘Web of Data’
Current status of the
semantic web
•
10-20 million semantic web documents
– Expressed in RDF, OWL, DAML+OIL
•
7K-10K ontologies
– These cover a variety of domains - multimedia,
computing, management, bio-medical sciences,
geography, entertainment, upper level concepts,
etc…
The above figures refer to resources which are publicly accessible on the
web
The Semantic Web today
•
To a significant extent the Semantic Web is already in place and is
characterized by a widespread production of formalized knowledge
models (ontologies and metadata), from a variety of different groups
and individuals
– “The Next Knowledge Medium - An information network with semiautomated services for the generation, distribution, and consumption of
knowledge”
• Stefik, 1986
– “Knowledge modelling to become a new form of literacy?”
• Stutt and Motta, 1997
•
Still primarily a research enterprise, however interest is rapidly
increasing in both governmental and business organizations
• “early adopters” phase
•
The result is slowly emerging as an unprecedented knowledge
resource, which can enable a new generation of intelligent
applications on the web
Semantic Web Applications
What can you do with the Semantic Web?
“Corporate Semantic Webs”
• A ‘corporate ontology’ is
used to provide a
homogeneous view over
heterogeneous data
sources
• Often tackle Enterprise
Information Integration
scenarios
• Hailed by Gartner as one
of the key emerging
strategic technology trends
– E.g., see personal information
management in Garlik
Exploiting large scale semantics
Next Generation
SW Applications
Semantic
Web
Exploiting large scale semantics
Next Generation
SW Applications
Semantic
Web
NGSW Applications in the context of AI research
Knowledge-Based Systems
Large Body
of Knowledge
“Today there has been a shift in
paradigm. The fundamental problem of
understanding intelligence is not the
identification of a few powerful
techniques, but rather the question of
how to represent large amounts of
knowledge in a fashion that permits their
effective use”
1977
Goldstein and Papert,
Intelligent Behaviour
The Knowledge Acquisition
Bottleneck
Knowledge
Large Body
of Knowledge
KA
Bottleneck
Intelligent Behaviour
SW as Enabler of Intelligent
Behaviour
Both a platform for
knowledge publishing and
a large scale source of
knowledge
Intelligent Behaviour
KBS vs SW Systems
Classic KBS
SW Systems
Provenance
Centralized
Distributed
Size
Small/Medium
Extra Huge
Repr. Schema
Homogeneous
Heterogeneous
Quality
High
Very Variable
Degree of trust
High
Very Variable
Key Paradigm Shift
Intelligent
Behaviour
Classic KBS
SW Systems
A function of
sophisticated,
logical, taskcentric problem
solving
A side-effect of
being able to
integrate
different types of
reasoning to
handle size and
heterogeneous
quality and
representation
Next Generation SW Applications: Examples
Case Study 1: Automatic Alignment of Thesauri
in the Agricultural/Fishery Domain
Method
- SCARLET - matching
by Harvesting the SW
- Automatically select
and combine multiple
online ontologies to

derive a relation
Access
Semantic Web

Scarlet
Deduce
Concept_A
(e.g., Supermarket)
Semantic Relation
Concept_B
)
(e.g., Building)
(
Two strategies

Building
OrganicChemical

PublicBuilding




Shop

Supermarket
Steroid
Lipid
Steroid
Cholesterol
Semantic Web




Scarlet
Supermarket

Building
Scarlet
Cholesterol

OrganicChemical
(A)
(B)
Deriving relations from (A) one ontology and (B) across ontologies.
Experiment
Matching:
• AGROVOC
•UN’s Food and
Agriculture
Organisation (FAO)
thesaurus
•28.174 descriptor terms
•10.028 non-descriptor
terms
• NALT
•US National Agricultural
Library Thesaurus
•41.577 descriptor terms
•24.525 non-descriptor
terms
226 Used Ontologies
http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf
http://reliant.teknowledge.com/DAML/SUMO.daml
http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml
http://gate.ac.uk/projects/
htechsight/Technologies.daml
http://reliant.teknowledge.com/DAML/Economy.daml
Evaluation 1 - Precision
• Manual assessment of 1000 mappings (15%)
• Evaluators:
– Researchers in the area of the Semantic Web
– 6 people split in two groups
• Results:
– Comparable to best results for background
knowledge based matchers.
Evaluation 2 – Error Analysis
Other Case Studies…
Giving meaning to tags
Example
Cluster_1: {college commerce corporate course education high
instructing learn learning lms school student}
activities4
learning4
teaching4
education
training1,4
qualification
school2
corporate1
institution
postSecondary
School2
student3
studiesAt
takesCourse
university2,3
offersCourse
course3
1http://gate.ac.uk/projects/htechsight/Employment.daml.
2http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml.
3http://www.mondeca.com/owl/moses/ita.owl.
4http://www.cs.utexas.edu/users/mfkb/RKF/tree/CLib-core-office.owl.
college2
Conclusions
Typical misconceptions…
• “The SW is a long-term vision…”
– Ehm…actually… it already exists…
• “The SW will never work because nobody is going to annotate
their web pages”
– The SW is not about annotating web pages, the SW is a web
of data, most of which are generated from DBs, or from web
mining software, or from applications which produce SW data as a
side effect of supporting users’ tasks
• “The idea of a universal ontology has failed before and will fail
again. Hence the SW is doomed”
– The SW is not about a single universal ontology. Already
there are around 10K ontologies and the number is growing…
– SW applications may use 1, 2, 3, or even hundreds of ontologies.
SW and Language
Technologies
• All the applications mentioned here combine
language, web, statistical and semantic
technologies
• Heterogeneity and sloppy modelling implies that
language and statistical technologies are almost
always needed when building NGSW apps
• In contrast with traditional KBS, intelligent
behaviour is more a side-effect of intg. multiple
techniques to handle scale and heterogeneity,
rather than a function of powerful deductive
reasoning