QA, lecture 3

Download Report

Transcript QA, lecture 3

Question Answering (QA)
Lecture 2
© Johan Bos
April 2008
Lecture 1
•
•
•
•
•
•
•
•
What is QA?
Query Log Analysis
Challenges in QA
History of QA
System Architecture
Methods
System Evaluation
State-of-the-art
• Question Analysis
• Background Knowledge
• Answer Typing
Lecture 3
•
•
•
•
•
Query Generation
Document Analysis
Semantic Indexing
Answer Extraction
Selection and Ranking
Pronto architecture
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Lecture 3
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Question Answering
© Johan Bos
April 2008
Lecture 3
Query Generation
• Document Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Query Generation
• Once we analysed the question, we
need to retrieve appropriate documents
• Most QA systems use an off-the-shelf
information retrieval system for this task
• Examples:
© Johan Bos
April 2008
– Lemur
– Lucene
– Indri (used by Pronto)
• The input of the IR system is a query;
the output is a ranked set of documents
Queries
• Query generation depends on the way
documents are indexed
• Based on
© Johan Bos
April 2008
– Semantic analysis of the question
– Expected answer type
– Background knowledge
• Computing a good query is hard – we
don’t want too little documents, and we
don’t want too many!
Generating Query Terms
• Example 1:
– Question: Who discovered prions?
© Johan Bos
April 2008
– Text A:
Dr. Stanley Prusiner received the Nobel
prize for the discovery of prions.
– Text B:
Prions are a kind of proteins that…
• Query terms?
Generating Query Terms
• Example 2:
– Question: When did Franz Kafka die?
– Text A: Kafka died in 1924.
– Text B: Dr. Franz died in 1971.
© Johan Bos
April 2008
• Query terms?
Generating Query Terms
• Example 3:
– Question: How did actor James Dean die?
– Text:
James Dean was killed in a car accident.
© Johan Bos
April 2008
• Query terms?
Useful query terms
• Ranked on importance:
© Johan Bos
April 2008
– Named entities
– Dates or time expressions
– Expressions in quotes
– Nouns
– Verbs
• Queries can be expanded using the
created local knowledge base
Query expansion example
TREC 44.6 (Sacajawea)
© Johan Bos
April 2008
How much is the Sacajawea coin worth?
• Query: sacajawea
Returns only five documents
• Use synonyms in query expansions
• New query: sacajawea OR sagajawea
Returns two hundred documents
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Question Answering
© Johan Bos
April 2008
Lecture 3
• Query Generation
Document Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
Document Analysis – Why?
• The aim of QA is to output answers, not
documents
• We need document analysis to
© Johan Bos
April 2008
– Find the correct type of answer in the
documents
– Calculate the probability that an answer
is correct
• Semantic analysis is important to get
valid answers
Document Analysis – When?
• After retrieval
– token or word based index
– keyword queries
– low precision
© Johan Bos
April 2008
• Before retrieval
– semantic indexing
– concept queries
– high precision
– More NLP required
Document Analysis – How?
© Johan Bos
April 2008
• Ideally use the same NLP tools as for
question analysis
– This will make the semantic matching of
Question and Answer easier
– Not always possible: wide coverage tools
are usally good at analysing text, but not
at analysing questions
– Questions are often not part of large
annotated corpora, on which NLP tools
are trained
Documents vs Passages
• Split documents into smaller passages
– This will make the semantic matching
faster and more accurate
– In Pronto the passage size is two
sentences, implemented by a sliding
window
© Johan Bos
April 2008
• Too small passages risk losing
important contextual information
– Pronouns and referring expressions
Document Analysis
•
•
•
•
•
Tokenisation
Part of speech tagging
Lemmatisation
Syntactic analysis (Parsing)
Semantic analysis (Boxing)
© Johan Bos
April 2008
• Named entity recognition
• Anaphora resolution
Why semantics is important
• Example:
© Johan Bos
April 2008
– Question: When did Franz Kafka die?
– Text A:
The mother of Franz Kafka died in 1918.
Why semantics is important
• Example:
© Johan Bos
April 2008
– Question: When did Franz Kafka die?
– Text A:
The mother of Franz Kafka died in 1918.
– Text B:
Kafka lived in Austria. He died in 1924.
Why semantics is important
• Example:
© Johan Bos
April 2008
– Question: When did Franz Kafka die?
– Text A:
The mother of Franz Kafka died in 1918.
– Text B:
Kafka lived in Austria. He died in 1924.
– Text C:
Both Kafka and Lenin died in 1924.
Why semantics is important
© Johan Bos
April 2008
• Example:
– Question: When did Franz Kafka die?
– Text A:
The mother of Franz Kafka died in 1918.
– Text B:
Kafka lived in Austria. He died in 1924.
– Text C:
Both Kafka and Lenin died in 1924.
– Text D:
Max Brod, who knew Kafka, died in 1930.
Why semantics is important
© Johan Bos
April 2008
• Example:
– Question: When did Franz Kafka die?
– Text A:
The mother of Franz Kafka died in 1918.
– Text B:
Kafka lived in Austria. He died in 1924.
– Text C:
Both Kafka and Lenin died in 1924.
– Text D:
Max Brod, who knew Kafka, died in 1930.
– Text E:
Someone who knew Kafka died in 1930.
© Johan Bos
April 2008
DRS for
“The mother of Franz Kafka died in 1918.”
_____________________
| x3 x4 x2 x1
|
|---------------------|
| mother(x3)
|
| named(x4,kafka,per) |
| named(x4,franz,per) |
| die(x2)
|
| thing(x1)
|
| event(x2)
|
| of(x3,x4)
|
| agent(x2,x3)
|
| in(x2,x1)
|
| timex(x1)=+1918XXXX |
|_____________________|
DRS for:
“Kafka lived in Austria. He died in 1924.”
© Johan Bos
April 2008
_______________________
_____________________
| x3 x1 x2
| | x5 x4
|
|-----------------------| |---------------------|
(| male(x3)
|+| die(x5)
|)
| named(x3,kafka,per)
| | thing(x4)
|
| live(x1)
| | event(x5)
|
| agent(x1,x3)
| | agent(x5,x3)
|
| named(x2,austria,loc) | | in(x5,x4)
|
| event(x1)
| | timex(x4)=+1924XXXX |
| in(x1,x2)
| |_____________________|
|_______________________|
© Johan Bos
April 2008
DRS for:
“Both Kafka and Lenin died in 1924.”
_____________________
| x6 x5 x4 x3 x2 x1
|
|---------------------|
| named(x6,kafka,per) |
| die(x5)
|
| event(x5)
|
| agent(x5,x6)
|
| in(x5,x4)
|
| timex(x4)=+1924XXXX |
| named(x3,lenin,per) |
| die(x2)
|
| event(x2)
|
| agent(x2,x3)
|
| in(x2,x1)
|
| timex(x1)=+1924XXXX |
|_____________________|
© Johan Bos
April 2008
DRS for:
“Max Brod, who knew Kafka, died in 1930.”
_____________________
| x3 x5 x4 x2 x1
|
|---------------------|
| named(x3,brod,per) |
| named(x3,max,per)
|
| named(x5,kafka,per) |
| know(x4)
|
| event(x4)
|
| agent(x4,x3)
|
| patient(x4,x5)
|
| die(x2)
|
| event(x2)
|
| agent(x2,x3)
|
| in(x2,x1)
|
| timex(x1)=+1930XXXX |
|_____________________|
© Johan Bos
April 2008
DRS for:
“Someone who knew Kafka died in 1930.”
_____________________
| x3 x5 x4 x2 x1
|
|---------------------|
| person(x3)
|
| named(x5,kafka,per) |
| know(x4)
|
| event(x4)
|
| agent(x4,x3)
|
| patient(x4,x5)
|
| die(x2)
|
| event(x2)
|
| agent(x2,x3)
|
| in(x2,x1)
|
| timex(x1)=+1930XXXX |
|_____________________|
Document Analysis
•
•
•
•
•
Tokenisation
Part of speech tagging
Lemmatisation
Syntactic analysis (Parsing)
Semantic analysis (Boxing)
© Johan Bos
April 2008
Named entity recognition
• Anaphora resolution
Recall the Answer-Type Taxonomy
© Johan Bos
April 2008
• We divided questions according to their
expected answer type
• Simple Answer-Type Taxonomy
PERSON
NUMERAL
DATE
MEASURE
LOCATION
ORGANISATION
ENTITY
Named Entity Recognition
© Johan Bos
April 2008
• In order to make use of the answer types, we
need to be able to recognise named
entities of the same types in the documents
PERSON
NUMERAL
DATE
MEASURE
LOCATION
ORGANISATION
ENTITY
Example Text
© Johan Bos
April 2008
Italy’s business world was rocked by the
announcement last Thursday that Mr. Verdi
would leave his job as vice-president of Music
Masters of Milan, Inc to become operations
director of Arthur Andersen.
Named entities
© Johan Bos
April 2008
Italy’s business world was rocked by the
announcement last Thursday that Mr. Verdi
would leave his job as vice-president of Music
Masters of Milan, Inc to become operations
director of Arthur Andersen.
Named Entity Recognition
© Johan Bos
April 2008
<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s
business world was rocked by the
announcement <TIMEX TYPE=„DATE“>last
Thursday</TIMEX> that Mr. <ENAMEX
TYPE=„PERSON“>Verdi</ENAMEX> would leave
his job as vice-president of <ENAMEX
TYPE=„ORGANIZATION“>Music Masters of Milan,
Inc</ENAMEX> to become operations director
of <ENAMEX TYPE=„ORGANIZATION“>Arthur
Andersen</ENAMEX>.
NER difficulties
• Several types of entities are too
numerous to include in dictionaries
• New names turn up every day
• Ambiguities
– Paris, Lazio
© Johan Bos
April 2008
• Different forms of same entities
in same text
– Brian Jones … Mr. Jones
• Capitalisation
NER approaches
• Rule-based approaches
– Hand-crafted rules
– Help from databases of known
named entities [e.g. locations]
© Johan Bos
April 2008
• Statistical approaches
– Features
– Machine learning
Document Analysis
•
•
•
•
•
Tokenisation
Part of speech tagging
Lemmatisation
Syntactic analysis (Parsing)
Semantic analysis (Boxing)
© Johan Bos
April 2008
• Named entity recognition
Anaphora resolution
What is anaphora?
• Relation between a pronoun and
another element in the same or earlier
sentence
• Anaphoric pronouns:
– he, him, she, her, it, they, them
© Johan Bos
April 2008
• Anaphoric noun phrases:
– the country,
– these documents,
– his hat, her dress
Anaphora (pronouns)
• Question:
What is the biggest sector in Andorra’s
economy?
• Corpus:
© Johan Bos
April 2008
Andorra is a tiny land-locked country in
southwestern Europe, between France and Spain.
Tourism, the largest sector of its tiny, well-to-do
economy, accounts for roughly 80% of the GDP.
• Answer: ?
Anaphora (definite descriptions)
• Question:
What is the biggest sector in Andorra’s
economy?
• Corpus:
© Johan Bos
April 2008
Andorra is a tiny land-locked country in
southwestern Europe, between France and Spain.
Tourism, the largest sector of the country’s tiny,
well-to-do economy, accounts for roughly 80% of the
GDP.
• Answer: ?
Anaphora Resolution
• Anaphora Resolution is the task of
finding the antecedents of anaphoric
expressions
• Example system:
© Johan Bos
April 2008
– Mitkov, Evans & Orasan (2002)
– http://clg.wlv.ac.uk/MARS/
“Kafka lived in Austria. He died in 1924.”
© Johan Bos
April 2008
_______________________
_____________________
| x3 x1 x2
| | x5 x4
|
|-----------------------| |---------------------|
(| male(x3)
|+| die(x5)
|)
| named(x3,kafka,per)
| | thing(x4)
|
| live(x1)
| | event(x5)
|
| agent(x1,x3)
| | agent(x5,x3)
|
| named(x2,austria,loc) | | in(x5,x4)
|
| event(x1)
| | timex(x4)=+1924XXXX |
| in(x1,x2)
| |_____________________|
|_______________________|
“Kafka lived in Austria. He died in 1924.”
© Johan Bos
April 2008
_______________________
_____________________
| x3 x1 x2
| | x5 x4
|
|-----------------------| |---------------------|
(| male(x3)
|+| die(x5)
|)
| named(x3,kafka,per)
| | thing(x4)
|
| live(x1)
| | event(x5)
|
| agent(x1,x3)
| | agent(x5,x3)
|
| named(x2,austria,loc) | | in(x5,x4)
|
| event(x1)
| | timex(x4)=+1924XXXX |
| in(x1,x2)
| |_____________________|
|_______________________|
Co-reference resolution
• Question:
What is the biggest sector in Andorra’s
economy?
• Corpus:
© Johan Bos
April 2008
Andorra is a tiny land-locked country in
southwestern Europe, between France and Spain.
Tourism, the largest sector of Andorra’s tiny, well-todo economy, accounts for roughly 80% of the GDP.
• Answer: Tourism
Question Answering
© Johan Bos
April 2008
Lecture 3
• Query Generation
• Document Analysis
Semantic Indexing
• Answer Extraction
• Selection and Ranking
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
© Johan Bos
April 2008
Semantic indexing
• If we index documents on the token
level, we cannot search for specific
semantic concepts
• If we index documents on semantic
concepts, we can formulate more
specific queries
• Semantic indexing requires a complete
preprocessing of the entire document
collection [can be costly]
Semantic indexing example
• Example NL question:
When did Franz Kafka die?
• Term-based
© Johan Bos
April 2008
– query:
kafka
– Returns all passages containing the term “kafka"
Semantic indexing example
• Example NL question:
When did Franz Kafka die?
• Term-based
– query:
kafka
– Returns all passages containing the term “kafka"
© Johan Bos
April 2008
• Concept-based
– query:
DATE & kafka
– Returns all passages containing the term "kafka"
and a date expression
Question Answering
© Johan Bos
April 2008
Lecture 3
• Query Generation
• Document Analysis
• Semantic Indexing
Answer Extraction
• Selection and Ranking
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Answer extraction
• Passage retrieval gives us a set of
ranked documents
• Match answer with question
© Johan Bos
April 2008
– DRS for question
– DRS for each possible document
– Score for amount of overlap
• Deep inference or shallow matching
• Use knowledge
Answer extraction: matching
© Johan Bos
April 2008
• Given a question and an expression
with a potential answer, calculate a
matching score
S = match(Q,A)
that indicates how well Q matches A
• Example
– Q: When was Franz Kafka born?
– A1: Franz Kafka died in 1924.
– A2: Kafka was born in 1883.
© Johan Bos
April 2008
Using logical inference
• Recall that Boxer produces first order
representations [DRSs]
• In theory we could use a theorem prover to
check whether a retrieved passage entails or
is inconsistent with a question
• In practice this is too costly, given the high
number of possible answer + question pairs
that need to be considered
• Also: theorem provers are precise – they
don’t give us information if they almost find a
proof, although this would be useful for QA
© Johan Bos
April 2008
Semantic matching
• Matching is an efficient approximation
to the inference task
• Consider flat semantic representation
of the passage and the question
• Matching gives a score of the amount
of overlap between the semantic
content of the question and a potential
answer
Matching Example
• Question:
When was Franz Kafka born?
© Johan Bos
April 2008
• Passage 1:
Franz Kafka died in 1924.
• Passage 2:
Kafka was born in 1883.
Semantic Matching [1]
Q: answer(X)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Semantic Matching [1]
Q: answer(X)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
X=x2
Semantic Matching [1]
Q: answer(x2)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Semantic Matching [1]
Q: answer(x2)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Y=x1
Semantic Matching [1]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Semantic Matching [1]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Semantic Matching [1]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Match score = 3/6 = 0.50
Semantic Matching [2]
Q: answer(X)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Semantic Matching [2]
Q: answer(X)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
X=x2
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Y=x1
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
E=x3
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Semantic Matching [2]
Q: answer(x2)
© Johan Bos
April 2008
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
A2: kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Match score = 4/6 = 0.67
Matching Example
• Question:
When was Franz Kafka born?
© Johan Bos
April 2008
• Passage 1: Match score = 0.50
Franz Kafka died in 1924.
• Passage 2: Match score = 0.67
Kafka was born in 1883.
Matching Techniques
• Weighted matching
– Higher weight for named entities
– Estimate weights using machine learning
• Incorporate background knowledge
© Johan Bos
April 2008
– WordNet [hyponyms]
– NomLex
– Paraphrases:
BORN(E) & IN(E,Y) & DATE(Y)  TEMP(E,Y)
Question Answering
© Johan Bos
April 2008
Lecture 3
• Query Generation
• Document Analysis
• Semantic Indexing
• Answer Extraction
Selection and Ranking
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Answer selection
• Rank answer
– Group duplicates
– Syntactically or semantically equivalent
– Sort on frequency
© Johan Bos
April 2008
• How specific should an answer be?
– Semantic relations between answers
– Hyponyms, synonyms
– Answer modelling
[PhD thesis Dalmas 2007]
• Answer cardinality
Answer selection example 1
• Where did Franz Kafka die?
© Johan Bos
April 2008
– In his bed
– In a sanatorium
– In Kierling
– Near Vienna
– In Austria
– In Berlin
– In Germany
Answer selection example 2
• Where is 3M based?
© Johan Bos
April 2008
– In Maplewood
– In Maplewood, Minn.
– In Minnesota
– In the U.S.
– In Maplewood, Minn., USA
– In San Francisco
– In the Netherlands
Architecture of PRONTO
parsing
question
answer
ccg
answer
reranking
boxing
drs
answer
selection
WordNet
NomLex
knowledge
© Johan Bos
April 2008
query
answer
typing
Indri
answer
extraction
Indexed Documents
Reranking
© Johan Bos
April 2008
• Most QA systems first produce a list of
possible answers…
• This is usually followed by a process
called reranking
• Reranking promotes correct answers to
a higher rank
Factors in reranking
• Matching score
– The better the match with the question, the
more likely the answers
• Frequency
© Johan Bos
April 2008
– If the same answer occurs many times,
it is likely to be correct
Answer Validation
• Answer Validation
– check whether an answer is likely to be
correct using an expensive method
• Tie breaking
– Deciding between two answers with
similar probability
© Johan Bos
April 2008
• Methods:
– Inference check
– Sanity checking
– Googling
Inference check
© Johan Bos
April 2008
• Use first-order logic [FOL] to check
whether a potential answer entails the
question
• This can be done with the use of a
theorem prover
– Translate Q into FOL
– Translate A into FOL
– Translate background knowledge into FOL
– If ((BKfol & Afol)  Qfol) is a theorem,
we have a likely answer
Sanity Checking
Answer should be informative,
that is, not part of the question
Q: Who is Tom Cruise married to?
A: Tom Cruise
© Johan Bos
April 2008
Q: Where was Florence Nightingale born?
A: Florence
Googling
• Given a ranked list of answers, some of
these might not make sense at all
• Promote answers that make sense
© Johan Bos
April 2008
• How?
• Use even a larger corpus!
– “Sloppy” approach
– “Strict” approach
© Johan Bos
April 2008
The World Wide Web
Answer validation (sloppy)
© Johan Bos
April 2008
• Given a question Q and a set of
answers A1…An
• For each i, generate query Q Ai
• Count the number of hits for each i
• Choose Ai with most number of hits
• Use existing search engines
– Google, AltaVista
– Magnini et al. 2002 (CCP)
Corrected Conditional Probability
• Treat Q and A as a bag of words
– Q = content words question
– A = answer
hits(A NEAR Q)
• CCP(Qsp,Asp) = ------------------------------
© Johan Bos
April 2008
hits(A) x hits(Q)
• Accept answers above a certain CCP
threshold
Answer validation (strict)
• Given a question Q and a set of
answers A1…An
• Create a declarative sentence with the
focus of the question replaced by Ai
• Use the strict search option in Google
© Johan Bos
April 2008
– High precision
– Low recall
• Any terms of the target not in the
sentence as added to the query
Example
© Johan Bos
April 2008
• TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
• Top-5 Answers:
1) Britain
* 2) Okemah, Okla.
3) Newport
* 4) Oklahoma
5) New York
Example: generate queries
© Johan Bos
April 2008
• TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
• Generated queries:
1) “Guthrie was born in Britain”
2) “Guthrie was born in Okemah, Okla.”
3) “Guthrie was born in Newport”
4) “Guthrie was born in Oklahoma”
5) “Guthrie was born in New York”
Example: add target words
• TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
• Generated queries:
© Johan Bos
April 2008
1) “Guthrie was born in Britain” Woody
2) “Guthrie was born in Okemah, Okla.” Woody
3) “Guthrie was born in Newport” Woody
4) “Guthrie was born in Oklahoma” Woody
5) “Guthrie was born in New York” Woody
Example: morphological variants
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:
© Johan Bos
April 2008
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
born in Britain” Woody
born in Okemah, Okla.” Woody
born in Newport” Woody
born in Oklahoma” Woody
born in New York” Woody
Example: google hits
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:
© Johan Bos
April 2008
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
“Guthrie is OR was OR are OR were
born in Britain” Woody 0
born in Okemah, Okla.” Woody 10
born in Newport” Woody 0
born in Oklahoma” Woody 42
born in New York” Woody 2
Example: reranked answers
© Johan Bos
April 2008
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Original answers
Reranked answers
1) Britain
* 2) Okemah, Okla.
3) Newport
* 4) Oklahoma
5) New York
* 4) Oklahoma
* 2) Okemah, Okla.
5) New York
1) Britain
3) Newport
Question Answering (QA)
Lecture 2
© Johan Bos
April 2008
Lecture 1
•
•
•
•
•
•
•
•
What is QA?
Query Log Analysis
Challenges in QA
History of QA
System Architecture
Methods
System Evaluation
State-of-the-art
• Question Analysis
• Background Knowledge
• Answer Typing
Lecture 3
•
•
•
•
•
Query Generation
Document Analysis
Semantic Indexing
Answer Extraction
Selection and Ranking
© Johan Bos
April 2008
Where to go from here
•
•
•
•
•
•
•
•
Producing answers in real-time
Improve accuracy
Answer explanation
User modelling
Speech interfaces
Dialogue (interactive QA)
Multi-lingual QA
Non sequential architectures