Question Answering

Download Report

Transcript Question Answering

Search Engines & Question Answering Giuseppe Attardi

Dipartimento di Informatica Università di Pisa Università di Pisa

Question Answering

IR: find documents relevant to query

query: boolean combination of keywords

QA: find answer to question

Question: expressed in natural language

Answer: short phrase (< 50 byte)

Trec-9 Q&A track

693 fact-based, short answer questions

either short (50 B) or long (250 B) answer

~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)

Score: MRR (penalizes second answer)

Resources: top 50 (no answer for 130 q)

Questions: 186 (Encarta), 314 (seeds from Excite logs), 193 (syntactic variants of 54 originals)

Commonalities

Approaches:

question classification

finding entailed answer type

use of WordNet

High-quality document search helpful (e.g. Queen College)

Sample Questions

Q: Who shot President Abraham Lincoln?

A: John Wilkes Booth Q: How many lives were lost in the Pan Am crash in Lockerbie?

A: 270 Q: How long does it take to travel from London to Paris through the Channel?

A: three hours 45 minutes Q: Which Atlantic hurricane had the highest recorded wind speed?

A: Gilbert (200 mph) Q: Which country has the largest part of the rain forest?

A: Brazil (60%)

Question Types

Class 1 Answer: single datum or list of items C: who, when, where, how (old, much, large) Class 2 A: multi-sentence C: extract from multiple sentences Class 3 A: across several texts C: comparative/contrastive Class 4 Class 5 A: an analysis of retrieved information C: synthesized coherently from several retrieved fragments A: result of reasoning C: word/domain knowledge and common sense reasoning

Question subtypes

Class 1.A

Class 1.B

About subjects, objects, manner, time or location About properties or attributes Class 1.C

Taxonomic nature

Results (long)

0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 SM U Q ue en s W ate rl oo IB M LI M SI N TT IC Pi sa MRR Unofficial

Falcon: Architecture

Question Collins Parser + NE Extraction Question Semantic Form Question Taxonomy Expected Answer Type WordNet Question Expansion Question Logical Form

Question Processing

Paragraph Index Paragraph filtering Answer Paragraphs

Paragraph Processing

Collins Parser + NE Extraction Answer Semantic Form Answer Logical Form Coreference Resolution Abduction Filter Answer

Answer Processing

Question parse

S VP NP S VP PP NP WP VBD DT JJ NNP NP TO VB IN NN

Who was the first Russian astronaut to walk in space

Question semantic form

first Russian

Answer type

PERSON astronaut walk space Question logic form: first(x)

astronaut(x)

Russian(x)

space(z)

walk(y, z, x)

PERSON(x)

Expected Answer Type

WordNet

QUANTITY dimension size Argentina Question: What is the size of Argentina?

Questions about definitions

Special patterns:

What {is|are} …?

What is the definition of …?

Who {is|was|are|were} …?

Answer patterns:

…{is|are}

…, {a|an|the}

… -

Question Taxonomy

Question Location Reason Product Nationality Manner Number Currency Language Mammal Reptile Game Organization Country City Province Continent Speed Degree Dimension Rate Duration Percentage Count

Question expansion

Morphological variants

invented

inventor

Lexical variants

– –

killer

far

assassin distance

Semantic variants

like

prefer

Indexing for Q/A

Alternatives:

IR techniques

Parse texts and derive conceptual indexes

Falcon uses paragraph indexing:

Vector-Space plus proximity

Returns weights used for abduction

Abduction to justify answers

Backchaining proofs from questions

Axioms:

Logical form of answer

World knowledge (WordNet)

Coreference resolution in answer text

Effectiveness:

14% improvement

Filters 121 erroneous answers (of 692)

Requires 60% question processing time

TREC 13 QA

Several subtasks:

Factoid questions

Definition questions

List questions

Context questions

LCC still best performance, but different architecture

LCC Block Architecture

Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR

Q

Extracts and ranks answers using NL techniques Question Processing Question Parse

Semantic Transformation

Recognition of Expected Answer Type

Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction

Theorem Prover

Answer Justification

Answer Reranking Axiomatic Knowledge Base

A

NER WordNet NER WordNet

Question Processing

Two main tasks

Determining the type of the answer

Extract keywords from the question and formulate a query

Answer Types

Factoid questions…

– –

Who, where, when, how many… The answers fall into a limited and somewhat predictable set of categories

• •

Who by… questions are going to be answered Where questions…

Generally, systems select answer types from a set of Named Entities , augmented with other types that are relatively easy to extract

Answer Types

Of course, it isn’t that easy…

Who questions can have organizations as answers

Who sells the most hybrid cars?

Which questions can have people as answers

Which president went to war with Mexico?

Answer Type Taxonomy

 

Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy

Answer Type Detection

Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.

Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.

Keyword Selection

Answer Type indicates what the question is looking for:

It can be mapped to a NE type and used for search in enhanced index

Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.

Keyword Extraction

Questions approximated by sets of unrelated keywords Question (from TREC QA track) Q002: What was the monetary value of the Nobel Peace Prize in 1989?

Keywords monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture?

Q004: How much did Mercury spend on advertising in 1993?

Peugeot, company, manufacture Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

Keyword Selection Algorithm

4.

5.

6.

7.

8.

1.

2.

3.

Select all non-stopwords in quotations Select all NNP words in recognized named entities Select all complex nominals with their adjectival modifiers Select all other complex nominals Select all nouns with adjectival modifiers Select all other nouns Select all verbs Select the answer type word

Passage Retrieval

Extracts and ranks passages using surface-text techniques

Q

Question Processing Question Parse

Semantic Transformation

Recognition of Expected Answer Type

Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction

Theorem Prover

Answer Justification

Answer Reranking Axiomatic Knowledge Base NER WordNet NER WordNet

A

Passage Extraction Loop

Passage Extraction Component

– – –

Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic

Passage quality and keyword adjustment

– – –

In the first iteration use the first 6 keyword selection heuristics If the number of passages is lower than a threshold

query is too strict

drop a keyword If the number of passages is higher than a threshold

query is too relaxed

add a keyword

Passage Scoring

Passages are scored based on keyword windows

For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

Window 1 k1 k2 k3 k2 k1 Window 2 k1 k2 k3 k2 k1 Window 3 k1 k2 k3 k2 k1 Window 4 k1 k2 k3 k2 k1

Passage Scoring

Passage ordering is performed using a sort that involves three scores:

The number of words from the question that are recognized in the same sequence in the window

The number of words that separate the most distant keywords in the window

The number of unmatched keywords in the window

Answer Extraction

Q

Extracts and ranks answers using NL techniques Question Processing Question Parse

Semantic Transformation

Recognition of Expected Answer Type

Keyword Extraction Question Semantics Keywords Passage Retrieval Document Retrieval Passages Answer Processing Answer Extraction

Theorem Prover

Answer Justification

Answer Reranking Axiomatic Knowledge Base

A

NER WordNet NER WordNet

Ranking Candidate Answers

Q066: Name the first private citizen to fly in space.

 

Answer type: Person Text passage: “ Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Ranking Candidate Answers

Q066: Name the first private citizen to fly in space.

  

Answer type: Person Text passage: “ Among them was Christa McAuliffe , the first private citizen to fly in space. McAuliffe . Brian Kerwin Karen Allen , best known for her starring role in “Raiders of the Lost Ark”, plays is featured as shuttle pilot Mike Smith ...” Best candidate answer: Christa McAuliffe

Features for Answer Ranking

      

Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate answer Number of question terms matched in the same sentence as the candidate answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer passage as in the question Average distance from candidate answer to question term matches

Lexical Chains

Question: When was the internal combustion engine invented?

Answer: The first internal combustion engine was built in 1867.

Lexical chains: (1) invent:v#1

HYPERNIM

HYPERNIM

create:v#1

create_by_mental_act:v#1

HYPONIM

build:v#1 Question: How many chromosomes does a human zygote have?

Answer: 46 chromosomes lie in the nucleus of every normal human cell.

Lexical chains: (1) zygote:n#1

nucleus:n#1 HYPERNIM

cell:n#1

HAS.PART

Theorem Prover

Q: What is the age of the solar system?

QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3) Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) & of_in(x2,x3) & solar_jj(x3) & system_nn(x3)) Answer: The solar system is 4.6 billion years old.

Wordnet Gloss: old_jj(x6)  live_vb(e2,x6,x2) & for_in(e2,x1) & relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2) Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1)  of_in(x1,x1)) Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3) Refutation assigns value to x2

Is the Web Different?

In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.

The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.

But…

The Web is Different

On the Web popular factoids are likely to be expressed in a gazillion different ways.

At least a few of which will likely match the way the question was asked.

So why not just grep (or agrep) the Web using all or pieces of the original question.

AskMSR

Process the question by…

Forming a search engine query from the original question

Detecting the answer type

Get some results

Extract answers of the right type based on

How often they occur

Step 1: Rewrite the questions

Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

Where is the Louvre Museum located?

The Louvre Museum is located in Paris

Who created the character of Scrooge?

Charles Dickens created the character of Scrooge.

Query rewriting

Classify question into seven categories

– – – Who When

is/was/are/were…?

is/did/will/are/were …?

Where

is/are/were …?

a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer.

“Where is the Louvre Museum located?”

“is the Louvre Museum located”

   

“the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”

Step 2: Query search engine

Send all rewrites to a Web search engine

Retrieve top N answers (100-200)

For speed, rely just on search engine’s “snippets”, not the full text of the actual document

Step 3: Gathering N-Grams

 

Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document

Example: “Who created the character of Scrooge?” Dickens Christmas Carol Charles Dickens Disney Carl Banks A Christmas Christmas Carol Uncle 117 78 75 72 54 41 45 31

Step 4: Filtering N-Grams

Each question type is associated with one or more “data-type filters” = regular expressions for answer types

 

Boost score of n-grams that match the expected answer type.

Lower score of n grams that don’t match.

Step 5: Tiling the Answers

Scores 20 15 Charles Dickens Dickens Mr Charles 10 merged, discard old n-grams Score 45 Mr Charles Dickens

Results

Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions

Technique does ok, not great (would have placed in top 9 of ~30 participants)

But with access to the Web… they do much better, would have come in second on TREC 2001

Harder Questions

Factoid question answering is really pretty silly.

A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.

Who is Condoleezza Rice?

Who is Mahmoud Abbas?

Why was Arafat flown to Paris?

IXE Components

IXE Framework

Passage Index NE Tagger EventStream ContextStream Sent. Splitter POS Tagger Python Perl Java Clustering Files Mem Mapping Crawler Synchronization MaxEntropy Indexer Web Service Unicode Wrappers RegExp Search Suffix Trees Readers OS Abstraction Object Store Text

Language Processing Tools

Maximum Entropy classifier

Sentence Splitter

Multi-language POS Tagger

Multi-language NE Tagger

Conceptual clustering

Maximum Entropy

    

Machine Learning approach to classification:

– –

System trained on test cases Learned model used for predictions Classification problem described as a number of features Each feature corresponds to a constraint on the model Maximum entropy model: the model with the maximum entropy of all the models that satisfy the constraints Choosing a model with less entropy, would add ‘information’ constraints not justified by the empirical evidence available

MaxEntropy: example data

Sunny, Happy Sunny, Happy, Dry Sunny, Happy, Humid Sunny, Sad, Dry Sunny, Sad, Humid Cloudy, Happy, Humid Cloudy, Happy, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Rainy, Happy, Humid Rainy, Happy, Dry Rainy, Sad, Dry Rainy, Sad, Humid Cloudy, Sad, Humid Cloudy, Sad, Humid Features Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Outdoor Indoor Indoor Indoor Indoor Indoor Indoor Outcome

MaxEnt: example predictions

Context Cloudy, Happy, Humid Outdoor 0.771

Indoor 0.228

Rainy, Sad, Humid 0.001

0.998

MaxEntropy: application

   

Sentence Splitting Not all punctuations are sentence boundaries:

U.S.A.

– –

St. Helen 3.14

Use features like:

Capitalization (previous, next word)

– – –

Present in abbreviation list Suffix/prefix digits Suffix/prefix long Precision: > 95%

Part of Speech Tagging

TreeTagger: statistic package based on HMM and decision trees

Trained on manually tagged text

Full language lexicon (with all inflections: 140.000 words for Italian)

Training Corpus

Il presidente della Repubblica NOM:*:*:*:femi:sg francese ADJ:*:*:*:femi:sg Francois Mitterrand NPR:*:*:*:*:* NPR:*:*:*:*:* ha proposto … DET:def:*:*:masc:sg NOM:*:*:*:masc:sg PRE:det:*:*:femi:sg VER:aux:pres:3:*:sg VER:*:pper:*:masc:sg _il _presidente _del _repubblica _francese _Francois _Mitterrand _avere _proporre

Named Entity Tagger

  

Uses MaxEntropy

NE categories: Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT

– •

Second level: 30-100. E.g. QUANTITY: MONEY, CARDINAL, PERCENT, MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC.

See resources at CoNLL (cnts.uia.ac.be/connl2004)

NE Features

Feature types:

– – – – – – –

word-level (es. capitalization, digits, etc.) punctuation POS tag Category designator (Mr, Av.) Category suffix (center, museum, street, etc.) Lowercase intermediate terms (of, de, in) presence in controlled dictionaries (locations, people, organizations)

Context: words in position -1, 0, +1

Sample training document

Today the Dow Jones industrial average gained thirtyeight and three quarter points.

When the first American style burger joint opened in London's fashionable Regent street some twenty years ago, it was mobbed.

Now it's Asia's turn.

The temperatures hover in the nineties, the heat index climbs into the hundreds.

And that's continued bad news for Florida where wildfires have charred nearly three hundred square miles in the last month and destroyed more than a hundred homes.

Clustering

Classification: assign an item to one among a given set of classes

Clustering: find groupings of similar items (i.e. generate the classes )

Conceptual Clustering of results

Similar to Vivisimo

Built on the fly rather than from

Predefined categories (Northern Light)

Generalized suffix tree of snippets

Stemming

Stop words (articulated, essential)

Demo: python , upnp

PiQASso: Pisa Question Answering System “Computers are useless, they can only give answers” Pablo Picasso

PiQASso Architecture

?

Question analysis Document collection Sentence Splitter MiniPar Relation Matching Question Classification Type Matching Query Formulation /Expansion WNSense WordNet MiniPar Answer analysis Answer Scoring Popularity Ranking Answe r found?

Answer Indexer Answer Pars

• • • •

Linguistic tools

WNSense extracts lexical knowledge from WordNet classifies words according to WordNet top-level categories, weighting its senses computes distance between words based on is-a links suggests word alternatives for query expansion Example: Theatre Categorization: artifact 0.60, communication 0.40

Synonyms: dramaturgy, theater, house,

dramatics

Minipar [D. Lin]

• • • •

Identifies dependency relations between words (e.g. subject, object, modifiers) Provides POS tagging Detects semantic types of words (e.g. location, person, organization) Extensible: we integrated a Maximum Entropy based Named Entity Tagger

obj subj lex-mod

What metal has the highest melting point?

mod

Question Analysis

What metal has the highest melting point?

1. Parsing

obj lex-mod subj

What metal has the highest melting point?

mod

2. Keyword extraction 3. Answer type detection 4. Relation extraction metal, highest, melting, point SUBSTANCE < SUBSTANCE , has, subj > <point, has, obj > <melting, point, lex-mod > <highest, point, mod > 1.

2.

3.

4.

NL question is parsed POS tags are used to select search keywords Expected answer type is determined applying heuristic rules to the dependency tree Additional relations are inferred and the answer entity is identified

Answer Analysis

Tungsten is a very dense material and has the highest melting point of any metal.

1 Parsing ………….

2 Answer type check 3 Relation extraction 1.

2.

3.

4.

5.

6.

SUBSTANCE Parse retrieved paragraphs Paragraphs not containing an entity of the expected type are discarded Dependency relations are extracted from Minipar output Matching distance between word relations in question and answer is computed Too distant paragraphs are filtered out Popularity rank used to weight distances <tungsten, material, pred > <tungsten, has, subj > <point, has, obj > … 4 Matching Distance

Tungsten

5 Distance Filtering 6 Popularity Ranking ANSWER

Match Distance between Question and Answer

Analyze relations between corresponding words considering:

number of matching words in question and in answer

distance between words. Ex: moon matching with satellite

relation types. Ex: words in the question related by subj while the matching words in the answer related by pred

http://medialab.di.unipi.it/askpiqasso.html

Improving PIQASso

More NLP

NLP techniques largely unsuccessful at information retrieval

Document retrieval as primary measure of information retrieval success

Document retrieval reduces the need for NLP techniques

– –

Discourse factors can be ignored Query words perform word-sense disambiguation

Lack of robustness:

NLP techniques are typically not as robust as word indexing

How these technologies help?

 

Question Analysis

The tag of the predicted category is added to the query Named-Entity Detection:

The NE categories found in text are included as tags in the index What party is John Kerry in? ( ORGANIZATION ) John Kerry defeated John Edwards Party .

in the primaries for the Democratic Tags: PERSON, ORGANIZATION

NLP Technologies

Coreference Relations:

Interpretation of a paragraph may depend on the context in which it occurs

Description Extraction:

Appositive and predicate nominative constructions provide descriptive terms about entities

Coreference Relations

Represented as annotations associated to words, i.e. words in the same position as the reference How long was Margaret Thatcher the prime minister? ( DURATION ) The truth, which has been added to over each of her power, is that they don't make many like her 11 1/2 years anymore.

Tags: DURATION Colocated: her , MARGARET THATCHER in

Description Extraction

Identifies DESCRIPTION category

Allows descriptive terms to be used in term expansion Who is Frank Gary ? ( DESCRIPTION ) What architect designed the Guggenheim Museum in Bilbao? ( PERSON ) Famed architect Frank Gary… Tags: DESCRIPTION, PERSON , LOCATION Buildings he designed include the Guggenheim Museum in Bilbao .

Colocation: he , FRANK GARY

NLP Technologies

Question Analysis:

identify the semantic type of the expected answer implicit in the query

Named-Entity Detection:

determine the semantic type of proper nouns and numeric amounts in text

Will it work?

Will these semantic relations improve paragraph retrieval?

Are the implementations robust enough to see a benefit across large document collections and question sets?

Are there enough questions where these relationships are required to find an answer?

Hopefully yes!

Preprocessing

Paragraph Detection

Sentence Detection

Tokenization

POS Tagging

NP-Chunking

Queries to a NE enhanced index

text matches bush text matches PERSON:bush text matches LOCATION:* & PERSON: bin-laden text matches DURATION:* PERSON:margaret-thatcher prime minister

Coreference

Task:

Determine space of entity extents:

Basal noun phrases:

Named entities consisting of multiple basal noun phrases are treated as a single entity

Pre-nominal proper nouns

Possessive pronouns

Determine which extents refer to the same entity in the world

Paragraph Retrieval

Indexing:

add NE tags for each NE category present in the text

add coreference relationships

Use syntactically-based categorical relations to create a DESCRIPTION category for term expansion

Use IXE passage indexer

High Composability

DocInfo name date size PassageDoc text boundaries Collection Collection Cursor next() QueryCursor next() PassageQueryCursor next()

Tagged Documents

select documents where

– – –

text matches bush text matches PERSON:bush text matches osama & LOCATION:* QueryCursor QueryCursorWord QueryCursorTaggedWord

Combination

Searching passages on a collection of tagged documents QueryCursor PassageQueryCursor>

Paragraph Retrieval

Retrieval:

Use question analysis component to predict answer category and append it to the question

Evaluate using TREC questions and answer patterns

500 questions

System Overview

Indexing Documents Paragr. Splitter Sent. Splitter Tokenization POS tagger NE Recognizer Coreference Resolution Description Extraction IXE indexer Paragraphs+ Retrieval Question Question Analysis IXE Search Paragraphs

Conclusion

QA is a challenging task

Involves state of the art techniques in various fields:

IR

NLP

AI

Managing large data sets

Advanced Software Technologies