Question Answering

Download Report

Transcript Question Answering

Search Engines & Question Answering Giuseppe Attardi

Dipartimento di Informatica Università di Pisa Università di Pisa

Question Answering



IR: find documents relevant to query

–

query: boolean combination of keywords



QA: find answer to question

–

Question: expressed in natural language

–

Answer: short phrase (< 50 byte)

Trec-9 Q&A track



693 fact-based, short answer questions

–

either short (50 B) or long (250 B) answer



~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)



Score: MRR (penalizes second answer)



Resources: top 50 (no answer for 130 q)



Questions: 186 (Encarta), 314 (seeds from Excite logs), 193 (syntactic variants of 54 originals)

Commonalities



Approaches:

–

question classification

–

finding entailed answer type

–

use of WordNet



High-quality document search helpful (e.g. Queen College)

Sample Questions

Q: Who shot President Abraham Lincoln?

A: John Wilkes Booth Q: How many lives were lost in the Pan Am crash in Lockerbie?

A: 270 Q: How long does it take to travel from London to Paris through the Channel?

A: three hours 45 minutes Q: Which Atlantic hurricane had the highest recorded wind speed?

A: Gilbert (200 mph) Q: Which country has the largest part of the rain forest?

A: Brazil (60%)

Question Types

Class 1 Answer: single datum or list of items C: who, when, where, how (old, much, large) Class 2 A: multi-sentence C: extract from multiple sentences Class 3 A: across several texts C: comparative/contrastive Class 4 Class 5 A: an analysis of retrieved information C: synthesized coherently from several retrieved fragments A: result of reasoning C: word/domain knowledge and common sense reasoning

Question subtypes

Class 1.A

Class 1.B

About subjects, objects, manner, time or location About properties or attributes Class 1.C

Taxonomic nature

Results (long)

0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 SM U Q ue en s W ate rl oo IB M LI M SI N TT IC Pi sa MRR Unofficial

Falcon: Architecture

Question Collins Parser + NE Extraction Question Semantic Form Question Taxonomy Expected Answer Type WordNet Question Expansion Question Logical Form

Question Processing

Paragraph Index Paragraph filtering Answer Paragraphs

Paragraph Processing

Collins Parser + NE Extraction Answer Semantic Form Answer Logical Form Coreference Resolution Abduction Filter Answer

Answer Processing

Question parse

S VP NP S VP PP NP WP VBD DT JJ NNP NP TO VB IN NN

Who was the first Russian astronaut to walk in space

Question semantic form

first Russian

Answer type

PERSON astronaut walk space Question logic form: first(x)



astronaut(x)



Russian(x)



space(z)



walk(y, z, x)



PERSON(x)

Expected Answer Type

WordNet

QUANTITY dimension size Argentina Question: What is the size of Argentina?

Questions about definitions



Special patterns:

–

What {is|are} …?

–

What is the definition of …?

–

Who {is|was|are|were} …?



Answer patterns:

–

…{is|are}

–

…, {a|an|the}

–

… -

Question Taxonomy

Question Location Reason Product Nationality Manner Number Currency Language Mammal Reptile Game Organization Country City Province Continent Speed Degree Dimension Rate Duration Percentage Count

Question expansion



Morphological variants

–

invented



inventor



Lexical variants

– –

killer



far



assassin distance



Semantic variants

–

like



prefer

Indexing for Q/A



Alternatives:

–

IR techniques

–

Parse texts and derive conceptual indexes



Falcon uses paragraph indexing:

–

Vector-Space plus proximity

–

Returns weights used for abduction

Abduction to justify answers



Backchaining proofs from questions



Axioms:

–

Logical form of answer

–

World knowledge (WordNet)

–

Coreference resolution in answer text



Effectiveness:

–

14% improvement

–

Filters 121 erroneous answers (of 692)

–

Requires 60% question processing time

TREC 13 QA



Several subtasks:

–

Factoid questions

–

Definition questions

–

List questions

–

Context questions



LCC still best performance, but different architecture

LCC Block Architecture

Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR

Extracts and ranks answers using NL techniques Question Processing Question Parse



Semantic Transformation



Recognition of Expected Answer Type



Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction



Theorem Prover



Answer Justification



Answer Reranking Axiomatic Knowledge Base

NER WordNet NER WordNet

Question Processing



Two main tasks

–

Determining the type of the answer

–

Extract keywords from the question and formulate a query

Answer Types



Factoid questions…

– –

Who, where, when, how many… The answers fall into a limited and somewhat predictable set of categories

• •

Who by… questions are going to be answered Where questions…

–

Generally, systems select answer types from a set of Named Entities , augmented with other types that are relatively easy to extract

Answer Types



Of course, it isn’t that easy…

–

Who questions can have organizations as answers

•

Who sells the most hybrid cars?

–

Which questions can have people as answers

•

Which president went to war with Mexico?

Answer Type Taxonomy

 

Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy

Answer Type Detection



Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.



Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.

Keyword Selection



Answer Type indicates what the question is looking for:

–

It can be mapped to a NE type and used for search in enhanced index



Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.

Keyword Extraction



Questions approximated by sets of unrelated keywords Question (from TREC QA track) Q002: What was the monetary value of the Nobel Peace Prize in 1989?

Keywords monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture?

Q004: How much did Mercury spend on advertising in 1993?

Peugeot, company, manufacture Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

Keyword Selection Algorithm

Select all non-stopwords in quotations Select all NNP words in recognized named entities Select all complex nominals with their adjectival modifiers Select all other complex nominals Select all nouns with adjectival modifiers Select all other nouns Select all verbs Select the answer type word

Passage Retrieval

Extracts and ranks passages using surface-text techniques

Question Processing Question Parse



Semantic Transformation



Recognition of Expected Answer Type



Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction



Theorem Prover



Answer Justification



Answer Reranking Axiomatic Knowledge Base NER WordNet NER WordNet

Passage Extraction Loop



Passage Extraction Component

– – –

Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic



Passage quality and keyword adjustment

– – –

In the first iteration use the first 6 keyword selection heuristics If the number of passages is lower than a threshold



query is too strict



drop a keyword If the number of passages is higher than a threshold



query is too relaxed



add a keyword

Passage Scoring



Passages are scored based on keyword windows

–

For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

Window 1 k1 k2 k3 k2 k1 Window 2 k1 k2 k3 k2 k1 Window 3 k1 k2 k3 k2 k1 Window 4 k1 k2 k3 k2 k1

Passage Scoring



Passage ordering is performed using a sort that involves three scores:

–

The number of words from the question that are recognized in the same sequence in the window

–

The number of words that separate the most distant keywords in the window

–

The number of unmatched keywords in the window

Answer Extraction

Extracts and ranks answers using NL techniques Question Processing Question Parse



Semantic Transformation



Recognition of Expected Answer Type



Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction



Theorem Prover



Answer Justification



Answer Reranking Axiomatic Knowledge Base

NER WordNet NER WordNet

Ranking Candidate Answers

Q066: Name the first private citizen to fly in space.

 

Answer type: Person Text passage: “ Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Ranking Candidate Answers

Q066: Name the first private citizen to fly in space.

  

Answer type: Person Text passage: “ Among them was Christa McAuliffe , the first private citizen to fly in space. McAuliffe . Brian Kerwin Karen Allen , best known for her starring role in “Raiders of the Lost Ark”, plays is featured as shuttle pilot Mike Smith ...” Best candidate answer: Christa McAuliffe

Features for Answer Ranking

      

Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate answer Number of question terms matched in the same sentence as the candidate answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer passage as in the question Average distance from candidate answer to question term matches

Lexical Chains

Question: When was the internal combustion engine invented?

Answer: The first internal combustion engine was built in 1867.

Lexical chains: (1) invent:v#1



HYPERNIM



HYPERNIM



create:v#1



create_by_mental_act:v#1



HYPONIM



build:v#1 Question: How many chromosomes does a human zygote have?

Answer: 46 chromosomes lie in the nucleus of every normal human cell.

Lexical chains: (1) zygote:n#1



nucleus:n#1 HYPERNIM



cell:n#1



HAS.PART



Theorem Prover

Q: What is the age of the solar system?

QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3) Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3)) Answer: The solar system is 4.6 billion years old.

Wordnet Gloss: old_jj(x6)  live_vb(e2,x6,x2) & for_in(e2,x1) & relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2) Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1)  of_in(x1,x1)) Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3) Refutation assigns value to x2

Is the Web Different?



In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.



The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.



But…

The Web is Different



On the Web popular factoids are likely to be expressed in a gazillion different ways.



At least a few of which will likely match the way the question was asked.



So why not just grep (or agrep) the Web using all or pieces of the original question.

AskMSR



Process the question by…

–

Forming a search engine query from the original question

–

Detecting the answer type



Get some results



Extract answers of the right type based on

–

How often they occur

Step 1: Rewrite the questions



Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

–

Where is the Louvre Museum located?

•

The Louvre Museum is located in Paris

–

Who created the character of Scrooge?

•

Charles Dickens created the character of Scrooge.

Query rewriting

Classify question into seven categories

– – – Who When

is/was/are/were…?

is/did/will/are/were …?

Where

is/are/were …?

a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer.

“Where is the Louvre Museum located?”



“is the Louvre Museum located”

   

“the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”

Step 2: Query search engine



Send all rewrites to a Web search engine



Retrieve top N answers (100-200)



For speed, rely just on search engine’s “snippets”, not the full text of the actual document

Step 3: Gathering N-Grams

 

Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document

–

Example: “Who created the character of Scrooge?” Dickens Christmas Carol Charles Dickens Disney Carl Banks A Christmas Christmas Carol Uncle 117 78 75 72 54 41 45 31

Step 4: Filtering N-Grams



Each question type is associated with one or more “data-type filters” = regular expressions for answer types

 

Boost score of n-grams that match the expected answer type.

Lower score of n grams that don’t match.

Step 5: Tiling the Answers

Scores 20 15 Charles Dickens Dickens Mr Charles 10 merged, discard old n-grams Score 45 Mr Charles Dickens

Results



Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions

–

Technique does ok, not great (would have placed in top 9 of ~30 participants)

–

But with access to the Web… they do much better, would have come in second on TREC 2001

Harder Questions



Factoid question answering is really pretty silly.



A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.

–

Who is Condoleezza Rice?

–

Who is Mahmoud Abbas?

–

Why was Arafat flown to Paris?

IXE Components

IXE Framework

Passage Index NE Tagger EventStream ContextStream Sent. Splitter POS Tagger Python Perl Java Clustering Files Mem Mapping Crawler Synchronization MaxEntropy Indexer Web Service Unicode Wrappers RegExp Search Suffix Trees Readers OS Abstraction Object Store Text

Language Processing Tools



Maximum Entropy classifier



Sentence Splitter



Multi-language POS Tagger



Multi-language NE Tagger



Conceptual clustering

Maximum Entropy

    

Machine Learning approach to classification:

– –

System trained on test cases Learned model used for predictions Classification problem described as a number of features Each feature corresponds to a constraint on the model Maximum entropy model: the model with the maximum entropy of all the models that satisfy the constraints Choosing a model with less entropy, would add ‘information’ constraints not justified by the empirical evidence available

MaxEntropy: example data

Sunny, Happy Sunny, Happy, Dry Sunny, Happy, Humid Sunny, Sad, Dry Sunny, Sad, Humid Cloudy, Happy, Humid Cloudy, Happy, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Rainy, Happy, Humid Rainy, Happy, Dry Rainy, Sad, Dry Rainy, Sad, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Features Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Indoor Indoor Indoor Indoor Indoor Indoor Outcome

MaxEnt: example predictions

Context Cloudy, Happy, Humid Outdoor 0.771

Indoor 0.228

Rainy, Sad, Humid 0.001

0.998

MaxEntropy: application

   

Sentence Splitting Not all punctuations are sentence boundaries:

–

U.S.A.

– –

St. Helen 3.14

Use features like:

–

Capitalization (previous, next word)

– – –

Present in abbreviation list Suffix/prefix digits Suffix/prefix long Precision: > 95%

Part of Speech Tagging



TreeTagger: statistic package based on HMM and decision trees



Trained on manually tagged text



Full language lexicon (with all inflections: 140.000 words for Italian)

Training Corpus

Il presidente della Repubblica NOM:*:*:*:femi:sg francese ADJ:*:*:*:femi:sg Francois Mitterrand NPR:*:*:*:*:* NPR:*:*:*:*:* ha proposto … DET:def:*:*:masc:sg NOM:*:*:*:masc:sg PRE:det:*:*:femi:sg VER:aux:pres:3:*:sg VER:*:pper:*:masc:sg _il _presidente _del _repubblica _francese _Francois _Mitterrand _avere _proporre

Named Entity Tagger

  

Uses MaxEntropy

–

NE categories: Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT

– •

Second level: 30-100. E.g. QUANTITY: MONEY, CARDINAL, PERCENT, MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC.

See resources at CoNLL (cnts.uia.ac.be/connl2004)

NE Features



Feature types:

– – – – – – –

word-level (es. capitalization, digits, etc.) punctuation POS tag Category designator (Mr, Av.) Category suffix (center, museum, street, etc.) Lowercase intermediate terms (of, de, in) presence in controlled dictionaries (locations, people, organizations)



Context: words in position -1, 0, +1

Sample training document

Today the Dow Jones industrial average gained thirtyeight and three quarter points.

When the first American style burger joint opened in London's fashionable Regent street some twenty years ago, it was mobbed.

Now it's Asia's turn.

The temperatures hover in the nineties, the heat index climbs into the hundreds.

And that's continued bad news for Florida where wildfires have charred nearly three hundred square miles in the last month and destroyed more than a hundred homes.

Clustering



Classification: assign an item to one among a given set of classes



Clustering: find groupings of similar items (i.e. generate the classes )

Conceptual Clustering of results



Similar to Vivisimo

–

Built on the fly rather than from

–

Predefined categories (Northern Light)



Generalized suffix tree of snippets



Stemming



Stop words (articulated, essential)



Demo: python , upnp

PiQASso: Pisa Question Answering System “Computers are useless, they can only give answers” Pablo Picasso

PiQASso Architecture

Question analysis Document collection Sentence Splitter MiniPar Relation Matching Question Classification Type Matching Query Formulation /Expansion WNSense WordNet MiniPar Answer analysis Answer Scoring Popularity Ranking Answe r found?

Answer Indexer Answer Pars

• • • •

Linguistic tools

WNSense extracts lexical knowledge from WordNet classifies words according to WordNet top-level categories, weighting its senses computes distance between words based on is-a links suggests word alternatives for query expansion Example: Theatre Categorization: artifact 0.60, communication 0.40

Synonyms: dramaturgy, theater, house,

dramatics

Minipar [D. Lin]

• • • •

Identifies dependency relations between words (e.g. subject, object, modifiers) Provides POS tagging Detects semantic types of words (e.g. location, person, organization) Extensible: we integrated a Maximum Entropy based Named Entity Tagger

obj subj lex-mod

What metal has the highest melting point?

mod

Question Analysis

What metal has the highest melting point?

1. Parsing

obj lex-mod subj

What metal has the highest melting point?

mod

2. Keyword extraction 3. Answer type detection 4. Relation extraction metal, highest, melting, point SUBSTANCE < SUBSTANCE , has, subj > <point, has, obj > <melting, point, lex-mod > <highest, point, mod > 1.

NL question is parsed POS tags are used to select search keywords Expected answer type is determined applying heuristic rules to the dependency tree Additional relations are inferred and the answer entity is identified

Answer Analysis

Tungsten is a very dense material and has the highest melting point of any metal.

1 Parsing ………….

2 Answer type check 3 Relation extraction 1.

SUBSTANCE Parse retrieved paragraphs Paragraphs not containing an entity of the expected type are discarded Dependency relations are extracted from Minipar output Matching distance between word relations in question and answer is computed Too distant paragraphs are filtered out Popularity rank used to weight distances <tungsten, material, pred > <tungsten, has, subj > <point, has, obj > … 4 Matching Distance

Tungsten

5 Distance Filtering 6 Popularity Ranking ANSWER

Match Distance between Question and Answer

Analyze relations between corresponding words considering:



number of matching words in question and in answer



distance between words. Ex: moon matching with satellite



relation types. Ex: words in the question related by subj while the matching words in the answer related by pred

http://medialab.di.unipi.it/askpiqasso.html

Improving PIQASso

More NLP



NLP techniques largely unsuccessful at information retrieval

–

Document retrieval as primary measure of information retrieval success

•

Document retrieval reduces the need for NLP techniques

– –

Discourse factors can be ignored Query words perform word-sense disambiguation

–

Lack of robustness:

•

NLP techniques are typically not as robust as word indexing

How these technologies help?

 

Question Analysis

–

The tag of the predicted category is added to the query Named-Entity Detection:

–

The NE categories found in text are included as tags in the index What party is John Kerry in? ( ORGANIZATION ) John Kerry defeated John Edwards Party .

in the primaries for the Democratic Tags: PERSON, ORGANIZATION

NLP Technologies



Coreference Relations:

–

Interpretation of a paragraph may depend on the context in which it occurs



Description Extraction:

–

Appositive and predicate nominative constructions provide descriptive terms about entities

Coreference Relations



Represented as annotations associated to words, i.e. words in the same position as the reference How long was Margaret Thatcher the prime minister? ( DURATION ) The truth, which has been added to over each of her power, is that they don't make many like her 11 1/2 years anymore.

Tags: DURATION Colocated: her , MARGARET THATCHER in

Description Extraction



Identifies DESCRIPTION category



Allows descriptive terms to be used in term expansion Who is Frank Gary ? ( DESCRIPTION ) What architect designed the Guggenheim Museum in Bilbao? ( PERSON ) Famed architect Frank Gary… Tags: DESCRIPTION, PERSON , LOCATION Buildings he designed include the Guggenheim Museum in Bilbao .

Colocation: he , FRANK GARY

NLP Technologies



Question Analysis:

–

identify the semantic type of the expected answer implicit in the query



Named-Entity Detection:

–

determine the semantic type of proper nouns and numeric amounts in text

Will it work?



Will these semantic relations improve paragraph retrieval?

–

Are the implementations robust enough to see a benefit across large document collections and question sets?

–

Are there enough questions where these relationships are required to find an answer?



Hopefully yes!

Preprocessing



Paragraph Detection



Sentence Detection



Tokenization



POS Tagging



NP-Chunking

Queries to a NE enhanced index

text matches bush text matches PERSON:bush text matches LOCATION:* & PERSON: bin-laden text matches DURATION:* PERSON:margaret-thatcher prime minister

Coreference



Task:

–

Determine space of entity extents:

•

Basal noun phrases:

–

Named entities consisting of multiple basal noun phrases are treated as a single entity

•

Pre-nominal proper nouns

•

Possessive pronouns

–

Determine which extents refer to the same entity in the world

Paragraph Retrieval



Indexing:

–

add NE tags for each NE category present in the text

–

add coreference relationships

–

Use syntactically-based categorical relations to create a DESCRIPTION category for term expansion

–

Use IXE passage indexer

High Composability

DocInfo name date size PassageDoc text boundaries Collection Collection Cursor next() QueryCursor next() PassageQueryCursor next()

Tagged Documents



select documents where

– – –

text matches bush text matches PERSON:bush text matches osama & LOCATION:* QueryCursor QueryCursorWord QueryCursorTaggedWord

Combination



Searching passages on a collection of tagged documents QueryCursor PassageQueryCursor>

Paragraph Retrieval



Retrieval:

–

Use question analysis component to predict answer category and append it to the question

–

Evaluate using TREC questions and answer patterns

•

500 questions

System Overview

Indexing Documents Paragr. Splitter Sent. Splitter Tokenization POS tagger NE Recognizer Coreference Resolution Description Extraction IXE indexer Paragraphs+ Retrieval Question Question Analysis IXE Search Paragraphs

Conclusion



QA is a challenging task



Involves state of the art techniques in various fields:

–

NLP

–

Managing large data sets

–

Advanced Software Technologies