Transcript Yago-QA

YAGO-QA
Answering Questions by
Structured Knowledge Queries





Peter Adolphs
Martin Theobald
Ulrich Schäfer
Hans Uszkoreit
Gerhard Weikum
ICSC
Stanford University
September 19, 2011
Jeopardy!
A big US city with two airports, one named after a World
War II hero, and one named after a World War II battle field?
2
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Deep-QA in NL
William Wilkinson's "An Account of the
Principalities of Wallachia and Moldavia"
inspired this author's most famous novel
This town is known as "Sin City" & its
downtown is "Glitter Gulch"
As of 2010, this is the only
former Yugoslav republic in the EU
99 cents got me a 4-pack of Ytterlig coasters
from this Swedish chain
question
classification &
decomposition
knowledge
backends
D. Ferrucci et al.: Building Watson: An Overview of the
DeepQA Project. AI Magazine, 2010.
YAGO
www.ibm.com/innovation/us/watson/index.htm
3
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Structured Knowledge Queries
A big US city with two airports, one named after a World
War II hero, and one named after a World War II battle field?
Select Distinct ?c Where {
?c type City . ?c locatedIn USA .
?a1 type Airport . ?a2 type Airport .
?a1 locatedIn ?c . ?a2 locatedIn ?c .
?a1 namedAfter ?p . ?p type WarHero .
?a2 namedAfter ?b . ?b type BattleField . }
In this work: focus on factoid and list questions
4
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Agenda
YAGO Server & API



Wikipedia-based information extraction
Searching & ranking in large RDF graphs
Names, Surface Patterns & Paraphrases




Named entity disambiguation
Mapping surface patterns onto semantic relations
Crowdsourcing for questions paraphrases
YAGO-QA Architecture


Template-based mapping of NL questions onto SPARQL
Conclusions & Future Work

5
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Information Extraction from Wikipedia
Subj.
Pred.
Obj.
Stanford
University
type
Private
University
hasPresident
J.L.Hennessy
hasStudents
15,319
foundedBy
L.Stanford
foundedIn
1891
…
…
…
6
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO Knowledge Base

Combine knowledge
from WordNet &
Wikipedia

Additional
Gazetteers
(geonames.org)

Part of the LinkedData cloud
7
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO-2 Numbers
Just Wikipedia
#Relations
Incl. Gazetteer Data
104
114
#Classes
364,740
364,740
#Entities
2,641,040
9,804,102
120,056,073
461,893,127
- types & classes
8,649,652
15,716,697
- base relations
25,471,211
196,713,637
- space, time & proven.
85,935,210
249,462,793
3.4 GB
8.7 GB
#Facts
Size (CSV format)
estimated precision > 95%
(for base relations excl. space, time & provenance)
www.mpi-inf.mpg.de/yago-naga/
8
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Searching & Ranking RDF Graphs in NAGA
Ranking based on confidence, compactness and relevance
Discovery queries:
Kiel
bornIn $x type
$a
scientist
diedOn $x
>
$b
Connectedness queries:
German
novelist
type
hasWon
hasSon
diedOn
$y
*
Thomas Mann
Goethe
Queries with regular expressions:
Ling
hasFirstName | hasLastName
(coAuthor
| advisor)*
Beng Chin Ooi
9
$x
type
scientist
worksFor
$y
locatedIn*
YAGO-QA: Answering Questions by Structured Knowledge Queries
Nobel
prize
Zhejiang
08.04.2015
YAGO Server: UI & API
%
10
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO Server: UI & API
YAGO-UI



Interactive online demo
RDF with time, space &
provenance annotations
SPARQL + keywords
YAGO-API
Two basic WebServices:
 processQuery
(String query)
 getYagoEntitiesByNames
(String[] names)
…
www.mpi-inf.mpg.de/yago-naga/demo.html
11
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Names, Surface Patterns & Paraphrases
Which chemist was born in London?
NN

(I) Named entity disambiguation




chemist  wordnet_chemist, wordnet_pharmacist
born
 Bertran_de_Born, Born_Identity_(Movie), Born_(Album)
London  London_UK, London_Arkansas, Antonio_London
(II) Mapping surface patterns onto semantic relations



VBD VBN IN NNP/LOC
<person> was_born_in <location>  bornIn(<person>, <location>)
<person> was_born_in <date>
 bornOn(<person>, <date>)
(III) Paraphrases of questions
<person> [was] born in <location>
<location>-born <person>
12
 bornIn(<person>, <location>)
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
(I) Named Entity Disambiguation

Wikipedia link structure



65,872,435 intra-wiki links
2,782,297 disambiguation pages &
328,372 redirects
2,886,027 distinct link anchor texts
 YAGO “means” relation


18,470,099 mappings of names to entities
6.2 distinct names per entity (on avg.)
Individual name disambiguation vs. joint
disambiguation
AIDA tool for graph-based disambiguation in YAGO-2:
“Robust Disambiguation of Named Entities in Text”
J. Hoffart et al. In EMNLP, Edinburgh, Scotland, 2011
#inlinks with anchor “Paris”
Paris
Paris, France
Paris Masters
Paris (mythology)
University of Paris
Paris, Texas
Paris, Ontario
Paris (rapper)
Open Gaz de France
Paris, Kentucky
Paris (2008 film)
Gare Saint-Lazare
Paris, Tennessee
BNP Paribas Masters
Paris, Maine
Paris Hilton
Paris, Arkansas
Paris (Supertramp album)
Gare du Nord
Paris (1979 TV series)
Count Paris
Palais Omnisports
de Paris-Bercy
Paris, Virginia
Paris 2012 Olympic bid
Paris (2003 film)
www.mpi-inf.mpg.de/yago-naga/aida/
13
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
32,362
570
134
118
79
56
45
29
26
20
19
18
17
16
14
12
11
10
9
8
7
6
5
4
3
(II) From Patterns to Semantic Relations

PROSPERA – statistical pattern mining from free-text



Domain-oriented extraction of patterns for known relations
(POS-enhanced n-grams)
X carried out his doctoral research in math under the supervision of Y
 X { carried out PRP doctoral research [IN NP] [DET] supervision [IN] } Y
Confidence & support based on seeds & counter seeds
Pattern/fact-duality & consistency reasoning
occurs(p,x,y)  expresses(p,R)  R(x,y)
occurs(p,x,y)  R(x,y)  expresses(p,R)
Spouse  Person  Person
capitalOfCountry  cityOfCountry
Spouse(x,y): x  y, y  x
pattern-fact duality
type constraints
inclusion dependencies
functional dependencies
10s to 100s of typed patterns per relation
14
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
PROSPERA Architecture

Gathering: Enhanced Hearst
patterns
 POS-enhanced n-grams
 Pattern-fact duality & constraints

Analysis: Refined pattern weights



Carefully chosen seeds and
counter seeds
Thresholds for pattern confidence
& support
Reasoning: Scalable extraction
& consistency reasoning
 MapReduce functions for pattern
extraction & statistics gathering
 Distributed MaxSat solver
(MAP Inference)
15
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
(III) Crowdsourcing for Question Paraphrases

Pattern acquisition from the
crowd




Annotators paraphrase naturallanguage seed questions
Seed questions are associated with
their semantic arguments and
functions
Gold resource for pattern
acquisition and system evaluation
Preliminary results


4,620 paraphrases for 254 seed
questions with 7 annotators
Total annotation time: ~49 hours,
~1 work-day per annotator
16
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO-QA Architecture

Input analysis



SProUT for tokenization, stemming & NER (http://sprout.dfki.de/)
NE gazetteer extended by YAGO entities
Input interpretation


17
Named-entity disambiguation based on YAGO statistics
Vague matching against the gathered question paraphrases
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO-QA Architecture (ct’d)

Input interpretation / Answer retrieval

An actor whose place of birth is Chicago.
 Which actor was born in Chicago ?
 Which <actor> was_born_in <Chicago> ?
 ?x type ARG1 . ?x bornIn ARG2 .

Template-based answer generation

18
Who/what is/are <?x> ?
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
YAGO-QA Example



Multiple named
entity annotations:
all names are
annotated
Interpretation
picks suitable NE
readings
Vague matching
against surface
templates
19
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015
Conclusions & Future Work

QA based on structured knowledge queries
(beyond IR-style retrieval of matching sentences/paragraphs)
 Wikipedia as rich knowledge backend
 Entities, semantic classes & typed relations
 Large-scale statistics for entity disambiguation & surface patterns
 Crowdsourcing for question paraphrases
 Predefined question templates translated into join queries

Future work



20
“Open-QA” via open-domain information extraction
Dynamic learning of template structures from grammars
More modular template structures
YAGO-QA: Answering Questions by Structured Knowledge Queries
08.04.2015