ppt - Fabian M. Suchanek
Download
Report
Transcript ppt - Fabian M. Suchanek
YAGO:
Yet Another Great Ontology
PhD Defense
Fabian M. Suchanek
(Max-Planck Institute for Informatics, Saarbrücken)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
1
Overview
رMotivation: Why would anybody need Ontologies?
رBuilding a Core Ontology: YAGO
رExtending the Core Ontology: SOFIE
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
2
Santa Claus in Need
World population
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
3
The Search for a Second Santa Claus
strong, tall guy , australian
Seeking strong, tall Australian man
I'm 27, blue eyes, looking for a tall strong Australian man.
girls-seek-guys.com/london/42
Fabian M. Suchanek
Cached
Similar pages
YAGO - A Core of Semantic Knowledge
4
The Search for a Second Santa Claus
strong person, > 1.90, Australian
Seeking strong, tall Australian man
I'm 27, blue eyes, looking for a tall strong Australian man. ...
I'm 190 kg
girls-seek-guys.com/london/42
Fabian M. Suchanek
Cached
Similar pages
YAGO - A Core of Semantic Knowledge
5
The Search for a Second Santa Claus
Hi Larry, it's me, Santa Claus. I think you misunderstood wh
Seeking strong, tall Australian man
I'm 27, blue eyes, looking for a tall strong Australian man.
girls-seek-guys.com/london/42
Fabian M. Suchanek
Cached
Similar pages
YAGO - A Core of Semantic Knowledge
6
Solution: An Ontology
physical entity
is a
person
is a
is a
continent
is a
1.90m
height
Fabian M. Suchanek
isFrom
Australia
YAGO - A Core of Semantic Knowledge
7
Solution: An Ontology
physical entity
is a
Classes
person
Relations
is a
is a
continent
is a
Individuals
Fabian M. Suchanek
isFrom
Australia
YAGO - A Core of Semantic Knowledge
8
Vision
Gathering the knowledge of this world
in a structured ontology.
رSemantic Search
The world, I‘d
like to say, even
though some
may contradict,
is not as it
seems. It rather
seems as if the
world seems not
what it seems
رQuestion answering
رMachine Translation
رDocument classification
…ر
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
9
Plan of Attack
رMotivation
رBuilding a Core Ontology: YAGO
رExtending the Core Ontology: SOFIE
The world, I‘d
like to say, even
though some
may contradict,
is not as it
seems. It rather
seems as if the
world seems not
what it seems
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
10
YAGO: Goal
Goal: Build a Large Ontology
Previous Approaches:
رAssemble the ontology manually
(WordNet, SUMO, Cyc, GeneOntology)
Problem: Usually low coverage (MPI is in none of these)
رUse community work (Semantic Wikipedia, Freebase)
Problem: We don't know yet whether it takes off
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
11
YAGO: Goal
Goal: Build a Large Ontology
Our Approach:
رExtract knowledge from Wikipedia and WordNet
(securing high coverage)
رUse extensive quality control techniques
(securing high consistency)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
12
YAGO: Infoboxes
Claus K
bornIn
blah blah blub (don't read this! Better listen to the
talk!) laber fasel suelz. Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Blub, aber blah! Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Insbesondere, blub, texte zu, und so weiter
Sydney
Exploit infoboxes
Born in: Sydney
...
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
13
YAGO: Categories
Claus K
blah blah blub (don't read this! Better listen to the
talk!) laber fasel suelz. Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Blub, aber blah! Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Insbesondere, blub, texte zu, und so weiter
Categories:
1980
born
bornIn
Sydney
Exploit infoboxes
Exploit relational categories
1980_births
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
14
YAGO: Categories
Australian Boxer
Claus K
isA
blah blah blub (don't read this! Better listen to the
talk!) laber fasel suelz. Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Blub, aber blah! Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Insbesondere, blub, texte zu, und so weiter
Categories:
Australian Boxers
Fabian M. Suchanek
1980
born
bornIn
Sydney
Exploit infoboxes
Exploit relational categories
Exploit conceptual categories
YAGO - A Core of Semantic Knowledge
15
YAGO: Categories
Claus K
Kick boxing
Australian Boxer
isA
isA
blah blah blub (don't read this! Better listen to the
talk!) laber fasel suelz. Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Blub, aber blah! Insbesondere, blub, texte
zu, und so weiter blah blah blub Elvis laber fasel
suelz. Insbesondere, blub, texte zu, und so weiter
Categories:
Kick boxing
Fabian M. Suchanek
1980
born
bornIn
Sydney
Exploit infoboxes
Exploit relational categories
Exploit conceptual categories
Avoid thematic categories
YAGO - A Core of Semantic Knowledge
16
YAGO: Upper Model
entity
person
?
Australian boxer
is a
born
1980
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
17
YAGO: Upper Model
Business
Social_group
People_by_occupation
?
Australian boxer
is a
born
1980
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
18
YAGO: Upper Model
Person
subclass
WordNet
Boxer
subclass
Australian boxer
is a
Wikipedia
born 1980
[Suchanek et al.:
WWW 2007]
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
19
YAGO: Quality Control
1. Canonicalization
1. ... of entities
Santa
Klaus
Santa
Clause
Santa Claus
Santa
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
20
YAGO: Quality Control
1. Canonicalization
1. ... of entities
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
21
YAGO: Quality Control
1. Canonicalization
1. ... of entities
2. ... of facts
born
born
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
1980
1980-12-19
22
YAGO: Quality Control
1. Canonicalization
1. ... of entities
2. ... of facts
2. Type Checks
1. Reductive Type Checking
range(bornOnDate, timepoint)
bornOnDate(Claus_Kent, Sydney)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
23
YAGO: Quality Control
Entity
1. Canonicalization
1. ... of entities
Person
Artifact
2. ... of facts
2. Type Checks
1. Reductive Type Checking
2. Type Coherence Checking
Boxer, Swimmer, Flight instructor, Airplane
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
24
YAGO: Quality Control
1. Canonicalization
Every fact and every entity
occurs exactly once
1. ... of entities
2. ... of facts
2. Type Checks
1. Reductive Type Checking
2. Type Coherence Checking
Every fact fulfills
its type constraints
[Suchanek et al.:
JWS 2008]
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
25
YAGO: Numbers
bornIn, actedIn,
hasInflation,...
Relations:
100
Entities:
2 million
Facts:
19 million
Accuracy:
95%
Fabian M. Suchanek
One of the largest
public free ontologies
Unprecedented quality
among automatedly
constructed ontologies
YAGO - A Core of Semantic Knowledge
26
YAGO: Model
boxer
#1 (ClausKent,is_a,boxer)
since
is a
#2 (#1, since, 1990)
1990
#3 (#1, source, Wikipedia)
source
Fabian M. Suchanek
Wikipedia
YAGO - A Core of Semantic Knowledge
27
YAGO: Model
A YAGO ontology over
رa set of relations R
رa set of common entities C
#1 (ClausKent,is_a,boxer)
رa set of fact identifiers I
#2 (#1, since, 1990)
is a function
#3 (#1, source, Wikipedia)
I (RCI) R (RIC)
We can talk about
رfacts (#1, source, Wikipedia)
رadditional arguments (#1, since, 1990)
رrelations (since, hasRange, time_interval)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
28
YAGO: Summary
YAGO is an ontology that is
رlarge (combining Wikipedia and WordNet)
رaccurate (using extensive quality control)
رcomputationally tractable (with a decideable consistency)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
29
Plan of Attack
رMotivation
رBuilding a Core Ontology: YAGO
رExtending the Core Ontology: SOFIE
The world, I‘d
like to say, even
though some
may contradict,
is not as it
seems. It rather
seems as if the
world seems not
what it seems
Fabian M. Suchanek
YAGO
YAGO - A Core of Semantic Knowledge
30
SOFIE: Goal Statement
Saint Nicholas
bornIn
Patara
Goal: Extending
the ontology
Saint Nicholas was born in Patara.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
31
SOFIE: Goal Statement
Saint Nicholas
bornIn
Patara
Goal: Extending
the ontology
Saint Nicholas ce e poдuл в Patara.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
32
SOFIE: Goal Statement
Saint Nicholas
Goal: Extending
the ontology
bornIn
Patara
recoverWithout(most_people, medication)
areUnder(0%,
the_age_of_18)
Saint Nicholas
was born in Patara.
support(these_findings, the_notion)
Previous Approaches:
رExtract knowledge from corpora (e.g. the Web)
(Text2Onto, Espresso, Snowball, TextRunner)
Problems: Low accuracy, non-canonicity
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
33
SOFIE: Goal Statement
Saint Nicholas
bornIn
Patara
Goal: Extending
the ontology
Saint Nicholas was born in Patara.
Our Approach (1):
رLEILA - Combining Linguistic and Statistical Analysis
[Suchanek et al.: KDD 2006]
Has high accuracy, but does not deliver canonicity
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
34
SOFIE: Goal Statement
Saint Nicholas
bornIn
Patara
Goal: Extending
the ontology
Saint Nicholas was born in Patara.
Our Approach (2):
رSOFIE: Use logical reasoning to guarantee canonicity
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
35
SOFIE: Example
YAGO
~ Worshipped People ~
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
Pattern occurrence ~~> pattern meaning
Fabian M. Suchanek
"was born in the year" expresses
bornInYear
YAGO - A Core of Semantic Knowledge
36
SOFIE: Example
YAGO
~ Worshipped People ~
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
Pattern occurrence ~~> pattern meaning
Pattern occurrence ~~> sentence meaning
Fabian M. Suchanek
"was born in the year" expresses
bornInYear
bornInYear
YAGO - A Core of Semantic Knowledge
1417
37
SOFIE: Example
YAGO
~ Worshipped People ~
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
diedInYear
Pattern occurrence ~~> pattern meaning
"was born in the year" expresses
bornInYear
Pattern occurrence ~~> sentence meaning
bornInYear
347
1417
People should be born before they die.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
38
SOFIE: Example
YAGO
~ Worshipped People ~
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
diedInYear
Pattern occurrence ~~> pattern meaning
"was born in the year" expresses
bornInYear
Pattern occurrence ~~> sentence meaning
bornInYear
347
1417
People should be born before they die.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
39
SOFIE: Example
YAGO
Task 1: Find Patterns
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
Task 2: Use semantic reasoning
diedInYear
347
Task 3: Disambiguate entities
Pattern occurrence ~~> pattern meaning
Pattern occurrence ~~> sentence meaning
bornInYear
1417
People should be born before they die.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
40
SOFIE: It‘s all logical formulae!
YAGO
Task 1: Find Patterns
bornInYear(ElvisPresley,1935)
occurs("was born in the year",
SaintNicholas,1417)
diedInYear(NicholasOfMyra,347)
occurs("was born in the year",
ElvisPresley,1935)
Task 2: Use semantic reasoning
Task 3: Disambiguate entities
occurs(P,X,Y) /\ expresses(P,R) =>
R(X,Y)
means(SaintNicholas,NicholasOfMyra) 0.8
means(SaintNicholas,NicholasOfFüe) 0.2
bornInYear(X,B) /\ diedInYear(X,D) =>
B<D
refersTo(SaintNicholas,NicholasOfFüe) ?
Fabian M. Suchanek
bornOnDate(NicholasOfFüe, 1417) ?
YAGO - A Core of Semantic Knowledge
41
SOFIE: Information Extraction as MAX SAT
We have a Weighted MAX SAT Problem
r(x,y) /\ s(x,z) => t(x,z) [w]
...
Problem:
رThe Weighted MAX SAT Problem is NP-hard
رOur instance contains YAGO (19 million facts)
and textual facts (e.g. 10,000 facts)
رThe best-known approximation algorithm
cannot deal well with our specific instance
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
42
SOFIE: A Unifying Framework
Task 1: Find Patterns
Algorithm
Functional MAX SAT
Task 2: Use semantic
reasoning
FOR i=1 TO 42
...
NEXT i
Task 3: Disambiguate
entities
NicholasOfFlüe
1417
[Suchanek et al:
TR 2009]
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
43
SOFIE: Experiments
Corpus
Type
# Docs
Relations Time
Precision
Wikipedia
toy corpus
structured
100
3
8min
100%
Wikipedia
subcorpus
semistructured
2000
15
15h
94%
News article unstructured
toy corpus
150
1
24min
91%
Biographies
from Web
3440
5
15h
90%
Fabian M. Suchanek
unstructured
YAGO - A Core of Semantic Knowledge
44
SOFIE: Summary
SOFIE unifies 3 tasks in a single framework:
Task 1: Find Patterns
Task 2: Use semantic reasoning
Task 3: Disambiguate entities
SOFIE delivers
رcanonicalized facts
رof high precision
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
45
But back to the original question...
Is there any Australian guy taller than
1.90m who could help me out?
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
46
Conclusion: Good News
ر
We made a great step towards gathering
the knowledge of this world in a structured ontology
YAGO
The world, I‘d
like to say, even
though some
may contradict,
is not as it
seems. It rather
seems as if the
world seems not
what it seems
ر
SOFIE
Christmas is safe!
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
47
References
[Suchanek et al.: KDD 2006] Fabian M. Suchanek, Georgiana Ifrim and Gerhard Weikum
"Combining Linguistic and Statistical Analysis
to Extract Relations from Web Documents"
Conference on Knowledge Discovery and Data Mining (KDD 2006)
[Suchanek et al.: WWW 2007] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum
"YAGO - A Core of Semantic Knowledge"
International World Wide Web conference (WWW 2007)
[Suchanek et al.: JWS 2008] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum
"YAGO - A Large Ontology from Wikipedia and WordNet"
Suchanek et al.: JWS Journal of Web Semantics 2008
[Suchanek et al.: TR 2009] Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum
„SOFIE – A Self-Organizing Framework for Information Extraction“
Submitted to the International World Wide Web conference (WWW 2009)
See Technical Report or my PhD Thesis on http://mpii.de/~suchanek
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
48
Acronyms
LEILA: Learning to Extract Information by Linguistic Analysis
YAGO: Yet Another Great Ontology
SOFIE: Self-Organizing Framework for Information Extraction
NAGA: Not another Google Answer
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
49
YAGO: Thematic vs Conceptual Categories
conceptual:
thematic:
Shallow linguistic
noun phrase parsing:
Australian boxers of German origin
Kick boxing in Australia
Premodifier
Head
Postmodifier
Heuristics: If the head is a plural word, the category is
conceptual
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
50
YAGO: Upper Model
Person
subclass
WordNet
Boxer 1
....
Boxer 42
Australian boxer
is a
Wikipedia
born 1980
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
51
A Hitchhiker's Guide to Ontology
DBpedia
(HU Berlin)
YAGO forms
taxonomic
backbone
YAGO is part of the
project by its Web
service
Linking Open Data
(HU Berlin,
U Leipzig,
OLS Inc.)
KOG
(U
Washington)
Fabian M. Suchanek
SUMO
(research
project)
YAGO and
SUMO have
been merged
YAGO
YAGO will
be included
Planned
YAGO is
used for
bootstrapping
Freebase
(community)
YAGO
contributes
the entities
UMBEL
(commercial)
Cyc
(commercial)
YAGO - A Core of Semantic Knowledge
[Suchanek et al.:
JWS 2008]
52
YAGO: Applications
NAGA
(Semantic Search &
Ranking)
[Kasneci et al:
ICDE 2008]
TagBooster
(User Study on
Social Tagging)
[Suchanek et al.:
CIKM 2008]
YAGO
ESTER
(Semantic Search +
Full Text Search)
[Bast et al.:
SIGIR 2007]
Fabian M. Suchanek
Projects by
other
people
YAGO - A Core of Semantic Knowledge
53
YAGO: Relations
is a
familyName
givenName
bornOnDate
diedOnDate
bornIn
diedIn
locatedIn
establishedOnDate
isMarriedTo
hasPopulation
hasHeight
hasWeight
hasInflation
actedIn
...
100 relations
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
54
19,000,000
YAGO: Size*
3,000,000
30,000
60,000 200,000 300,000
KnowItAll
SUMO WordNet OpenCyc
Cyc
Yago
* Publicly available ontologies with a quality guarantee. Size is not correlated with usefulness.
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
55
YAGO: Model
Axioms:
person
(x, is_a, y)
subclass
saint
is a
Fabian M. Suchanek
(y, subclass, z)
is a
=> (x, is_a, z)
...
YAGO - A Core of Semantic Knowledge
56
YAGO: Model
finite, unique
f1, f2, f3, f4, f5,
f6, f7, f8, f9, f10
Axioms:
(x, is_a, y)
derive facts
(y, subclass, z)
=> (x, is_a, z)
f1, f2, f3, f4, f5
...
Eliminate facts
f1, f2, f3
finite, unique
[Suchanek et al.:
WWW 2007]
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
57
YAGO: Knowledge Representation
OWL Full
RDFS
ADTs
YAGO
Acyclicity
Reification
subClassOf
Datatypes
Transitivity
Property
Restrictions
OWL DL
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
58
SOFIE rules!
R(X,Y)
occurs(P,WX,WY)
occurs(P,WX,WY)
/\ R(X,Z)
/\ refersTo(WX.X)
/\ expressed(P,R)
/\ type(R,functionalRelation)
/\ refersTo(WY,Y)
/\ refersTo(WX.X)
=> Y = Z
/\ R(X,Y)
/\ refersTo(WY,Y)
=> expresses(P,R)
/\ range(R,D1)
/\ domain(R,D2)
disambiguationPrior(W,X) => refersTo(W,X)
/\ type(X,D1)
/\ type(Y,D2)
R(X,Y)
=> R(X,Y)
+ relation-dependent rules:
bornInYear(X,B) /\ diedInYear(X,D) => B<D
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
59
SOFIE: Clause transformation
Rules
r(X,Y) /\ s(X,Y) => t(X,X)
u(a)
Entities {a,b}
Grounded Rules
r(a,a) /\ s(a,a) => t(a,a)
Clauses
r(a,a) \/ s(a,a) \/ t(a,a)
r(a,b) /\ s(a,b) => t(a,a)
r(a,b) \/ s(a,b) \/ t(a,a)
r(b,a) /\ s(b,a) => t(b,b)
r(b,a) \/ s(b,a) \/ t(b,b)
r(b,b) /\ s(b,b) => t(b,b)
r(b,b) \/ s(b,b) \/ t(b,b)
u(a)
u(a)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
60
SOFIE: Clause transformation
Clauses
1
r(a,a) \/ s(a,a) \/ t(a,a)
Textual Facts
r(a,b) \/ s(a,b) \/ t(a,a)
r(a,b) [w2]
r(b,a) \/ s(b,a) \/ t(b,b)
r(b,a) [w3]
r(b,b) \/ s(b,b) \/ t(b,b)
r(b,b) [w4]
r(a,a) [w1]
u(a)
YAGO
s(a,a)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
61
SOFIE: Clause weighting
Clauses
1
\/ 1
Textual Facts
\/ t(a,a) [w1]
r(a,a) [w1]
1
\/ s(a,b) \/ t(a,a) [w2]
r(a,b) [w2]
1
\/ s(b,a) \/ t(b,b) [w3]
r(b,a) [w3]
1
\/ s(b,b) \/ t(b,b) [w4]
r(b,b) [w4]
u(a) [W]
YAGO
s(a,a)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
62
SOFIE: Hypothesis generation
Textual Facts
Rules
r(X,Y) /\ s(X,Y) => t(X,X)
r(a,b) [w1]
Hypotheses
t(a,a)
t(b,b)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
63
SOFIE: Hypothesis generation
Grounded Rules
Rules
r(X,Y) /\ s(X,Y) => t(X,X)
r(a,a) /\ s(a,a) => t(a,a)
r(a,b) /\ s(a,b) => t(a,a)
Hypotheses
t(a,a)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
64
SOFIE: Functional MAX SAT Algorithm
The functional MAX SAT Algorithm
considers only unit clauses.
Variables
=0
X
Clauses
Y
X \/ Y [w1]
Z
=0
=1
X \/ Z [w1]
Y \/ Z [w1]
Z [w1]
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
65
SOFIE: Experiments
Corpus
Type
# Docs
# Rel
Time
# Facts
Precision
Recall
Wikipedia toy
corpus
structured
100
3
8min
165
100%
98%
Wikipedia toy
corpus
semi-structured
50% infoboxes
removed
100
3
8min
165
100%
57%
Wikipedia
subcorpus
semi-structured
2000
15
15h
505
94%
?
News article
toy corpus
unstructured
150
1
24min
35, 46
91%
24%,
31%
Snowball
65
56%
31%
15h
744
90%
?
Biographies
from Web
unstructured
Fabian M. Suchanek
3440
5
YAGO - A Core of Semantic Knowledge
66
SOFIE: Large-Scale Experiment
Corpus:
Goal:
3700 biography documents
downloaded from the Web
Extract bornIn, bornOnDate,
diedIn, diedOnDate, politicianOf
Runtime: (summed over 5 batches)
Parsing
7:05h
Hypothesis Generation
6:15h
Solving
2:30h
Total
15:50h
Results: (precision in %)
87 87 13
98 95 90
bornIn bornOnD diedIn diedOnD polOf
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
67
SOFIE: Relation to Markov Logic
r(x,y) /\ s(x,z) => t(x,z) [w]
...
Number of satisfied
instances of the ith formula
Weight of the ith formula
P(X) ~ e sat(i,X) wi
max X
e sat(i,X) wi
log( e sat(i,X) wi )
max X
sat(i,X) wi
max X
P
false
true
bornIn(Nicholas, Patras)
Fabian M. Suchanek
~~~~> Weighted MAX SAT problem
YAGO - A Core of Semantic Knowledge
68
LEILA: Workflow
Fix one relation, e.g. foundedInYear
Examples:
UDS 1948
UDS 1949
UDS 1950
...
MPII 1988
MPII 1989
MPII 1990
...
The UDS was founded in 1948.
X was founded in Y 10 2
The UDS has 1974 employees.
X has Y employees 3 20
The MPII has 1'000 employees.
The MPI-SWS was founded in 2004.
The MPI-SWS has 2'003 employees.
foundedIn(MPI-SWS, 2004)
Fabian M. Suchanek
LEILA: Theoretical considerations
THEOREM [Goodnaturedness]:
As the number of parsed sentences increases,
the probability of false extractions decreases.
Intuition: One of two cases applies
1. A pattern occurs very frequently.
Then it is unlikely to be mistaken for a good pattern
2. A pattern occurs very infrequently.
Then it does not matter if it is mistaken for a good pattern.
[Suchanek et al.:
KDD 2006]
Fabian M. Suchanek
LEILA: The Linguistic Part
X was founded in Y
The MPI-SWS was founded in 2004.
foundedIn(MPI-SWS, 2004)
Fabian M. Suchanek
LEILA: The Linguistic Part
X was founded in Y
The MPI-SWS, the great institution, was founded in 2004.
foundedIn(MPI-SWS, 2004)
Fabian M. Suchanek
LEILA: The Linguistic Part
X was founded in Y
The MPI-SWS, the great institution, was founded in 2004.
foundedIn(MPI-SWS, 2004)
Fabian M. Suchanek
Future Work: With YAGO
رpersonalize (Shady, Maya)
رuse social networks to extend YAGO (Maya, Sharat, Ashwin)
رmake YAGO multilingual (Gerard)
رadd Web services (Nicoleta)
رmake querying efficient (Gjergji)
رstore YAGO efficiently (Thomas)
رmake reasoning efficient (Mauro,Martin)
رprovide good visualization (Shady)
رadd a temporal component to SOFIE
رadd biomedical knowledge (Alessandro Fiori)
رadd multimodal support (Martin Schreiber)
رadd natural language support (help from workshop on Monday)
Fabian M. Suchanek
(slide by Prof.
Gerhard Weikum)
Future Work: Beyond YAGO
رjoin forces with other ontology projects
رlearn not just facts, but also relations
رapply the SOFIE approach in related settings
(information extraction with music or pictures?)
Fabian M. Suchanek
Acknowledgements
The following people have worked with me
LEILA: Georgiana Ifrim and Gerhard Weikum
YAGO: Gjergji Kasneci and Gerhard Weikum
SOFIE: Mauro Sozio and Gerhard Weikum
TagBooster: Milan Vojnovic and Dinan Gunawardena
NAGA: Gjergji Kasneci, Georgiana Ifrim, Shady Elbassuoni and Gerhard Weikum
ESTER: Holger Bast, Ingmar Weber and Alex Chitea
YAGO+SUMO: Gerard de Melo and Adam Pease
STAR: Gjergji, Mauro, Maya Ramanath and Gerhard
Thank you for making these projects possible!
Fabian M. Suchanek