Transcript Question Answering
Search Engines & Question Answering Giuseppe Attardi
Dipartimento di Informatica Università di Pisa Università di Pisa
Question Answering
IR: find documents relevant to query
–
query: boolean combination of keywords
QA: find answer to question
–
Question: expressed in natural language
–
Answer: short phrase (< 50 byte)
Trec-9 Q&A track
693 fact-based, short answer questions
–
either short (50 B) or long (250 B) answer
~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)
Score: MRR (penalizes second answer)
Resources: top 50 (no answer for 130 q)
Questions: 186 (Encarta), 314 (seeds from Excite logs), 193 (syntactic variants of 54 originals)
Commonalities
Approaches:
–
question classification
–
finding entailed answer type
–
use of WordNet
High-quality document search helpful (e.g. Queen College)
Sample Questions
Q: Who shot President Abraham Lincoln?
A: John Wilkes Booth Q: How many lives were lost in the Pan Am crash in Lockerbie?
A: 270 Q: How long does it take to travel from London to Paris through the Channel?
A: three hours 45 minutes Q: Which Atlantic hurricane had the highest recorded wind speed?
A: Gilbert (200 mph) Q: Which country has the largest part of the rain forest?
A: Brazil (60%)
Question Types
Class 1 Answer: single datum or list of items C: who, when, where, how (old, much, large) Class 2 A: multi-sentence C: extract from multiple sentences Class 3 A: across several texts C: comparative/contrastive Class 4 Class 5 A: an analysis of retrieved information C: synthesized coherently from several retrieved fragments A: result of reasoning C: word/domain knowledge and common sense reasoning
Question subtypes
Class 1.A
Class 1.B
About subjects, objects, manner, time or location About properties or attributes Class 1.C
Taxonomic nature
Results (long)
0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 SM U Q ue en s W ate rl oo IB M LI M SI N TT IC Pi sa MRR Unofficial
Falcon: Architecture
Question Collins Parser + NE Extraction Question Semantic Form Question Taxonomy Expected Answer Type WordNet Question Expansion Question Logical Form
Question Processing
Paragraph Index Paragraph filtering Answer Paragraphs
Paragraph Processing
Collins Parser + NE Extraction Answer Semantic Form Answer Logical Form Coreference Resolution Abduction Filter Answer
Answer Processing
Question parse
S VP NP S VP PP NP WP VBD DT JJ NNP NP TO VB IN NN
Who was the first Russian astronaut to walk in space
Question semantic form
first Russian
Answer type
PERSON astronaut walk space Question logic form: first(x)
astronaut(x)
Russian(x)
space(z)
walk(y, z, x)
PERSON(x)
Expected Answer Type
WordNet
QUANTITY dimension size Argentina Question: What is the size of Argentina?
Questions about definitions
Special patterns:
–
What {is|are} …?
–
What is the definition of …?
–
Who {is|was|are|were} …?
Answer patterns:
–
…{is|are}
–
…, {a|an|the}
–
… -
Question Taxonomy
Question Location Reason Product Nationality Manner Number Currency Language Mammal Reptile Game Organization Country City Province Continent Speed Degree Dimension Rate Duration Percentage Count
Question expansion
Morphological variants
–
invented
inventor
Lexical variants
– –
killer
far
assassin distance
Semantic variants
–
like
prefer
Indexing for Q/A
Alternatives:
–
IR techniques
–
Parse texts and derive conceptual indexes
Falcon uses paragraph indexing:
–
Vector-Space plus proximity
–
Returns weights used for abduction
Abduction to justify answers
Backchaining proofs from questions
Axioms:
–
Logical form of answer
–
World knowledge (WordNet)
–
Coreference resolution in answer text
Effectiveness:
–
14% improvement
–
Filters 121 erroneous answers (of 692)
–
Requires 60% question processing time
TREC 13 QA
Several subtasks:
–
Factoid questions
–
Definition questions
–
List questions
–
Context questions
LCC still best performance, but different architecture
LCC Block Architecture
Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR
Q
Extracts and ranks answers using NL techniques Question Processing Question Parse
Semantic Transformation
Recognition of Expected Answer Type
Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking Axiomatic Knowledge Base
A
NER WordNet NER WordNet
Question Processing
Two main tasks
–
Determining the type of the answer
–
Extract keywords from the question and formulate a query
Answer Types
Factoid questions…
– –
Who, where, when, how many… The answers fall into a limited and somewhat predictable set of categories
• •
Who by… questions are going to be answered Where questions…
–
Generally, systems select answer types from a set of Named Entities , augmented with other types that are relatively easy to extract
Answer Types
Of course, it isn’t that easy…
–
Who questions can have organizations as answers
•
Who sells the most hybrid cars?
–
Which questions can have people as answers
•
Which president went to war with Mexico?
Answer Type Taxonomy
Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy
Answer Type Detection
Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.
Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.
Keyword Selection
Answer Type indicates what the question is looking for:
–
It can be mapped to a NE type and used for search in enhanced index
Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.
Keyword Extraction
Questions approximated by sets of unrelated keywords Question (from TREC QA track) Q002: What was the monetary value of the Nobel Peace Prize in 1989?
Keywords monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture?
Q004: How much did Mercury spend on advertising in 1993?
Peugeot, company, manufacture Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer?
name, managing, director, Apricot, Computer
Keyword Selection Algorithm
4.
5.
6.
7.
8.
1.
2.
3.
Select all non-stopwords in quotations Select all NNP words in recognized named entities Select all complex nominals with their adjectival modifiers Select all other complex nominals Select all nouns with adjectival modifiers Select all other nouns Select all verbs Select the answer type word
Passage Retrieval
Extracts and ranks passages using surface-text techniques
Q
Question Processing Question Parse
Semantic Transformation
Recognition of Expected Answer Type
Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking Axiomatic Knowledge Base NER WordNet NER WordNet
A
Passage Extraction Loop
Passage Extraction Component
– – –
Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic
Passage quality and keyword adjustment
– – –
In the first iteration use the first 6 keyword selection heuristics If the number of passages is lower than a threshold
query is too strict
drop a keyword If the number of passages is higher than a threshold
query is too relaxed
add a keyword
Passage Scoring
Passages are scored based on keyword windows
–
For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:
Window 1 k1 k2 k3 k2 k1 Window 2 k1 k2 k3 k2 k1 Window 3 k1 k2 k3 k2 k1 Window 4 k1 k2 k3 k2 k1
Passage Scoring
Passage ordering is performed using a sort that involves three scores:
–
The number of words from the question that are recognized in the same sequence in the window
–
The number of words that separate the most distant keywords in the window
–
The number of unmatched keywords in the window
Answer Extraction
Q
Extracts and ranks answers using NL techniques Question Processing Question Parse
Semantic Transformation
Recognition of Expected Answer Type
Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking Axiomatic Knowledge Base
A
NER WordNet NER WordNet
Ranking Candidate Answers
Q066: Name the first private citizen to fly in space.
Answer type: Person Text passage: “ Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”
Ranking Candidate Answers
Q066: Name the first private citizen to fly in space.
Answer type: Person Text passage: “ Among them was Christa McAuliffe , the first private citizen to fly in space. McAuliffe . Brian Kerwin Karen Allen , best known for her starring role in “Raiders of the Lost Ark”, plays is featured as shuttle pilot Mike Smith ...” Best candidate answer: Christa McAuliffe
Features for Answer Ranking
Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate answer Number of question terms matched in the same sentence as the candidate answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer passage as in the question Average distance from candidate answer to question term matches
Lexical Chains
Question: When was the internal combustion engine invented?
Answer: The first internal combustion engine was built in 1867.
Lexical chains: (1) invent:v#1
HYPERNIM
HYPERNIM
create:v#1
create_by_mental_act:v#1
HYPONIM
build:v#1 Question: How many chromosomes does a human zygote have?
Answer: 46 chromosomes lie in the nucleus of every normal human cell.
Lexical chains: (1) zygote:n#1
nucleus:n#1 HYPERNIM
cell:n#1
HAS.PART
Theorem Prover
Q: What is the age of the solar system?
QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3) Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3)) Answer: The solar system is 4.6 billion years old.
Wordnet Gloss: old_jj(x6) live_vb(e2,x6,x2) & for_in(e2,x1) & relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2) Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1) of_in(x1,x1)) Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3) Refutation assigns value to x2
Is the Web Different?
In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.
The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.
But…
The Web is Different
On the Web popular factoids are likely to be expressed in a gazillion different ways.
At least a few of which will likely match the way the question was asked.
So why not just grep (or agrep) the Web using all or pieces of the original question.
AskMSR
Process the question by…
–
Forming a search engine query from the original question
–
Detecting the answer type
Get some results
Extract answers of the right type based on
–
How often they occur
Step 1: Rewrite the questions
Intuition: The user’s question is often syntactically quite close to sentences that contain the answer
–
Where is the Louvre Museum located?
•
The Louvre Museum is located in Paris
–
Who created the character of Scrooge?
•
Charles Dickens created the character of Scrooge.
Query rewriting
Classify question into seven categories
– – – Who When
is/was/are/were…?
is/did/will/are/were …?
Where
is/are/were …?
a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer.
“Where is the Louvre Museum located?”
“is the Louvre Museum located”
“the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”
Step 2: Query search engine
Send all rewrites to a Web search engine
Retrieve top N answers (100-200)
For speed, rely just on search engine’s “snippets”, not the full text of the actual document
Step 3: Gathering N-Grams
Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document
–
Example: “Who created the character of Scrooge?” Dickens Christmas Carol Charles Dickens Disney Carl Banks A Christmas Christmas Carol Uncle 117 78 75 72 54 41 45 31
Step 4: Filtering N-Grams
Each question type is associated with one or more “data-type filters” = regular expressions for answer types
Boost score of n-grams that match the expected answer type.
Lower score of n grams that don’t match.
Step 5: Tiling the Answers
Scores 20 15 Charles Dickens Dickens Mr Charles 10 merged, discard old n-grams Score 45 Mr Charles Dickens
Results
Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions
–
Technique does ok, not great (would have placed in top 9 of ~30 participants)
–
But with access to the Web… they do much better, would have come in second on TREC 2001
Harder Questions
Factoid question answering is really pretty silly.
A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.
–
Who is Condoleezza Rice?
–
Who is Mahmoud Abbas?
–
Why was Arafat flown to Paris?
IXE Components
IXE Framework
Passage Index NE Tagger EventStream ContextStream Sent. Splitter POS Tagger Python Perl Java Clustering Files Mem Mapping Crawler Synchronization MaxEntropy Indexer Web Service Unicode Wrappers RegExp Search Suffix Trees Readers OS Abstraction Object Store Text
Language Processing Tools
Maximum Entropy classifier
Sentence Splitter
Multi-language POS Tagger
Multi-language NE Tagger
Conceptual clustering
Maximum Entropy
Machine Learning approach to classification:
– –
System trained on test cases Learned model used for predictions Classification problem described as a number of features Each feature corresponds to a constraint on the model Maximum entropy model: the model with the maximum entropy of all the models that satisfy the constraints Choosing a model with less entropy, would add ‘information’ constraints not justified by the empirical evidence available
MaxEntropy: example data
Sunny, Happy Sunny, Happy, Dry Sunny, Happy, Humid Sunny, Sad, Dry Sunny, Sad, Humid Cloudy, Happy, Humid Cloudy, Happy, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Rainy, Happy, Humid Rainy, Happy, Dry Rainy, Sad, Dry Rainy, Sad, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Features Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Indoor Indoor Indoor Indoor Indoor Indoor Outcome
MaxEnt: example predictions
Context Cloudy, Happy, Humid Outdoor 0.771
Indoor 0.228
Rainy, Sad, Humid 0.001
0.998
MaxEntropy: application
Sentence Splitting Not all punctuations are sentence boundaries:
–
U.S.A.
– –
St. Helen 3.14
Use features like:
–
Capitalization (previous, next word)
– – –
Present in abbreviation list Suffix/prefix digits Suffix/prefix long Precision: > 95%
Part of Speech Tagging
TreeTagger: statistic package based on HMM and decision trees
Trained on manually tagged text
Full language lexicon (with all inflections: 140.000 words for Italian)
Training Corpus
Il presidente della Repubblica NOM:*:*:*:femi:sg francese ADJ:*:*:*:femi:sg Francois Mitterrand NPR:*:*:*:*:* NPR:*:*:*:*:* ha proposto … DET:def:*:*:masc:sg NOM:*:*:*:masc:sg PRE:det:*:*:femi:sg VER:aux:pres:3:*:sg VER:*:pper:*:masc:sg _il _presidente _del _repubblica _francese _Francois _Mitterrand _avere _proporre
Named Entity Tagger
Uses MaxEntropy
–
NE categories: Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT
– •
Second level: 30-100. E.g. QUANTITY: MONEY, CARDINAL, PERCENT, MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC.
See resources at CoNLL (cnts.uia.ac.be/connl2004)
NE Features
Feature types:
– – – – – – –
word-level (es. capitalization, digits, etc.) punctuation POS tag Category designator (Mr, Av.) Category suffix (center, museum, street, etc.) Lowercase intermediate terms (of, de, in) presence in controlled dictionaries (locations, people, organizations)
Context: words in position -1, 0, +1
Sample training document
When the first American style burger joint opened in
Now it's
And that's continued bad news for
Clustering
Classification: assign an item to one among a given set of classes
Clustering: find groupings of similar items (i.e. generate the classes )
Conceptual Clustering of results
Similar to Vivisimo
–
Built on the fly rather than from
–
Predefined categories (Northern Light)
Generalized suffix tree of snippets
Stemming
Stop words (articulated, essential)
Demo: python , upnp
PiQASso: Pisa Question Answering System “Computers are useless, they can only give answers” Pablo Picasso
PiQASso Architecture
?
Question analysis Document collection Sentence Splitter MiniPar Relation Matching Question Classification Type Matching Query Formulation /Expansion WNSense WordNet MiniPar Answer analysis Answer Scoring Popularity Ranking Answe r found?
Answer Indexer Answer Pars
• • • •
Linguistic tools
WNSense extracts lexical knowledge from WordNet classifies words according to WordNet top-level categories, weighting its senses computes distance between words based on is-a links suggests word alternatives for query expansion Example: Theatre Categorization: artifact 0.60, communication 0.40
Synonyms: dramaturgy, theater, house,
dramatics
Minipar [D. Lin]
• • • •
Identifies dependency relations between words (e.g. subject, object, modifiers) Provides POS tagging Detects semantic types of words (e.g. location, person, organization) Extensible: we integrated a Maximum Entropy based Named Entity Tagger
obj subj lex-mod
What metal has the highest melting point?
mod
Question Analysis
What metal has the highest melting point?
1. Parsing
obj lex-mod subj
What metal has the highest melting point?
mod
2. Keyword extraction 3. Answer type detection 4. Relation extraction metal, highest, melting, point SUBSTANCE < SUBSTANCE , has, subj > <point, has, obj > <melting, point, lex-mod > <highest, point, mod > 1.
2.
3.
4.
NL question is parsed POS tags are used to select search keywords Expected answer type is determined applying heuristic rules to the dependency tree Additional relations are inferred and the answer entity is identified
Answer Analysis
Tungsten is a very dense material and has the highest melting point of any metal.
1 Parsing ………….
2 Answer type check 3 Relation extraction 1.
2.
3.
4.
5.
6.
SUBSTANCE Parse retrieved paragraphs Paragraphs not containing an entity of the expected type are discarded Dependency relations are extracted from Minipar output Matching distance between word relations in question and answer is computed Too distant paragraphs are filtered out Popularity rank used to weight distances <tungsten, material, pred > <tungsten, has, subj > <point, has, obj > … 4 Matching Distance
Tungsten
5 Distance Filtering 6 Popularity Ranking ANSWER
Match Distance between Question and Answer
Analyze relations between corresponding words considering:
number of matching words in question and in answer
distance between words. Ex: moon matching with satellite
relation types. Ex: words in the question related by subj while the matching words in the answer related by pred
http://medialab.di.unipi.it/askpiqasso.html
Improving PIQASso
More NLP
NLP techniques largely unsuccessful at information retrieval
–
Document retrieval as primary measure of information retrieval success
•
Document retrieval reduces the need for NLP techniques
– –
Discourse factors can be ignored Query words perform word-sense disambiguation
–
Lack of robustness:
•
NLP techniques are typically not as robust as word indexing
How these technologies help?
Question Analysis
–
The tag of the predicted category is added to the query Named-Entity Detection:
–
The NE categories found in text are included as tags in the index What party is John Kerry in? ( ORGANIZATION ) John Kerry defeated John Edwards Party .
in the primaries for the Democratic Tags: PERSON, ORGANIZATION
NLP Technologies
Coreference Relations:
–
Interpretation of a paragraph may depend on the context in which it occurs
Description Extraction:
–
Appositive and predicate nominative constructions provide descriptive terms about entities
Coreference Relations
Represented as annotations associated to words, i.e. words in the same position as the reference How long was Margaret Thatcher the prime minister? ( DURATION ) The truth, which has been added to over each of her power, is that they don't make many like her 11 1/2 years anymore.
Tags: DURATION Colocated: her , MARGARET THATCHER in
Description Extraction
Identifies DESCRIPTION category
Allows descriptive terms to be used in term expansion Who is Frank Gary ? ( DESCRIPTION ) What architect designed the Guggenheim Museum in Bilbao? ( PERSON ) Famed architect Frank Gary… Tags: DESCRIPTION, PERSON , LOCATION Buildings he designed include the Guggenheim Museum in Bilbao .
Colocation: he , FRANK GARY
NLP Technologies
Question Analysis:
–
identify the semantic type of the expected answer implicit in the query
Named-Entity Detection:
–
determine the semantic type of proper nouns and numeric amounts in text
Will it work?
Will these semantic relations improve paragraph retrieval?
–
Are the implementations robust enough to see a benefit across large document collections and question sets?
–
Are there enough questions where these relationships are required to find an answer?
Hopefully yes!
Preprocessing
Paragraph Detection
Sentence Detection
Tokenization
POS Tagging
NP-Chunking
Queries to a NE enhanced index
text matches bush text matches PERSON:bush text matches LOCATION:* & PERSON: bin-laden text matches DURATION:* PERSON:margaret-thatcher prime minister
Coreference
Task:
–
Determine space of entity extents:
•
Basal noun phrases:
–
Named entities consisting of multiple basal noun phrases are treated as a single entity
•
Pre-nominal proper nouns
•
Possessive pronouns
–
Determine which extents refer to the same entity in the world
Paragraph Retrieval
Indexing:
–
add NE tags for each NE category present in the text
–
add coreference relationships
–
Use syntactically-based categorical relations to create a DESCRIPTION category for term expansion
–
Use IXE passage indexer
High Composability
DocInfo name date size PassageDoc text boundaries Collection
Tagged Documents
select documents where
– – –
text matches bush text matches PERSON:bush text matches osama & LOCATION:* QueryCursor QueryCursorWord QueryCursorTaggedWord
Combination
Searching passages on a collection of tagged documents QueryCursor
Paragraph Retrieval
Retrieval:
–
Use question analysis component to predict answer category and append it to the question
–
Evaluate using TREC questions and answer patterns
•
500 questions
System Overview
Indexing Documents Paragr. Splitter Sent. Splitter Tokenization POS tagger NE Recognizer Coreference Resolution Description Extraction IXE indexer Paragraphs+ Retrieval Question Question Analysis IXE Search Paragraphs
Conclusion
QA is a challenging task
Involves state of the art techniques in various fields:
–
IR
–
NLP
–
AI
–
Managing large data sets
–
Advanced Software Technologies