What’s in store for questionanswering? Prognostications based on corpus analysis of several hundred million questions John B.

Download Report

Transcript What’s in store for questionanswering? Prognostications based on corpus analysis of several hundred million questions John B.

What’s in store for questionanswering?
Prognostications based on corpus analysis of several hundred
million questions
John B. Lowe
Vice President for Language Engineering and Chief Linguist
Ask Jeeves, Inc. Emeryville, CA
October 7, 2000
Joint SIGDAT Conference on Empirical Methods in Natural Language
Processing and Very Large Corpora
Hong Kong
11/6/2015
1
Overview
• “Take-home” messages when considering the Q-A task:
• Make sure you understand the question
• Know what constitutes an “answer”
• Robustness, Robustness, Robustness
• Some anecdotes and a few statistics
• Query types (keywords, questions, stories, etc.)
• From both the consumer side of AJ (I.e. ask.com) as well as the
corporate side
• Prognostications
• The best systems will be hybrids
• Knowledge cliff is as tall as ever, if not taller, and it will be some
time before it is climbed. Set expectations accordingly!
11/6/2015
2
Another View of an Overview
• This presentation contains:
• 12 Actual or nearly actual UQs (indeed all queries cited
are real, unless cited from the literature or specifically
marked)
• 7 Rhetorical questions
• 5 Summary statistics covering a subcorpus of
approximately 1B UQs
11/6/2015
3
Definitions: What is an answer?
Increasing length, complexity
•
•
•
•
•
•
•
•
•
•
11/6/2015
Short, coherent, responsive snippet of text
Result of a Computation or Deduction
The Trace of the process of arriving at a result
Longer snippet of text (a Passage)
Reference to a document or part of a document
Summary or extract from a document
Document
Set of documents
Audio, video, etc.
Some combination of the above
4
Definitions: what is a query?
Increasing length, complexity
• One or more Keywords
• Keywords with Boolean Operators or
additional user supplied structure
• [numeric or other Parametric Values set via UI]
• Phrases (keywords with linguistic coherence)
• (Grammatical) Sentences with interrogative or
imperative syntax
• Short Discourses, usually concluded with a
question
• Audio, video, etc.
• Some combination of the above
11/6/2015
5
Question-Answering vs. IR
• Classically, question-answering systems provided
answers in response to questions
• In contrast, IR systems provided documents in
response to queries, normally composed of
keywords
Corollaries:
• Providing documents in response to questions is
not question-answering
• Providing answers in response to queries is not IR
However, “the world is not black and white. More
like black and grey” – Graham Greene
11/6/2015
6
Question-Answering vs. IR
Question
“Classical”
QuestionAnswering
Answer(s)
“Classical”
Information
Retrieval
TREC-8
Like
Hybrid
11/6/2015
Keyword(s)
Document(s)
IR
System
Keywords
Question
Answer
Q-A
System
Q-A
System
Document
IR
System
7
Query types arranged on a “Difficulty Scale”
Difficulty
•
•
•
•
•
•
Keywords and “Keywords Plus”
Short, factual (TREC-8-like) Questions
“Hard” Lookup Questions
Questions that look hard but aren’t
Questions that look hard and are
“Story Problems” – Two Flavors
• These will usually be all jumbled together!
11/6/2015
8
An Anecdotal Analysis of the
Question-Answering Task,
Based on User Behavior and
Expectations, as Reflected by
What They Ask
NB:
Most of these UQs are from ask.com, the opendomain consumer-oriented web site.
Some UQs from corporate implementations have
been modified to protect the anonymity of
customers
11/6/2015
9
Users may not be trainable
• Users often expect the system to derive or
otherwise obtain appropriate context
• At minimum, POS, WSD and other basic linguistic and
semantic distinctions are expected.
• Users may attempt to provide such context if they
feel it is important or unavailable to the system
• In which case, watch out!
• Users may evaluate the system to determine how
best to provide input (and context)
• Often this is done by “experimental input”
• Muddies the user log and challenges adaptive
approaches
11/6/2015
10
Intentions of users are complex
Information Sought
Among Ask Jeeves Visitors
75
77
Find info about
a subject
28
28
Help with a
purchase
Spring '99
Winter '99
26
24
Ask Jeeves for
advice
17
16
See if I could
stump Jeeves
0%
20%
40%
60%
80%
100%
NB: this data is for ask.com!
11/6/2015
11
Short, factual questions
• Where is Greenwich, CT? (Lehnert 1982)
• 42N, 80W
• About 90 miles north of New York City
• …
• Where is the Taj Mahal? (TREC-8)
• Atlantic City, New Jersey, USA
• Agra, Uttar Pradesh, India
• NB: answer reflects cultural bias of corpus from which
answer was obtained
• What is the meaning of life?
• For Ask Jeeves, it is (found in) a URL
11/6/2015
12
Answers to “Hard” Lookup Questions
“Can my gynacologist [sic] tell my parents if
I’m pregnant?”
• Answers can be very, very short!
• In this case, however, though the length of the
answer is a single bit, an authoritative document
is probably the best response
11/6/2015
13
“Hard” Lookup questions (cont’d)
“Can my gynacologist [sic] tell my
parents if I’m pregnant?”
• Sometimes spelling errors reflect language
competence (and therefore indirectly age)
• Sex of asker is (nominally) clear. This bit of
personalization is of course based on real world
knowledge
• Use of “if” rather than “that” prevents
presupposing the asker is pregnant
• Utility of the answer is very different depending
on whether the presupposition is true or not!
11/6/2015
14
Another “Hard” Lookup Question
• What did Tom Hanksi say to Private
Ryan as hei was dying?
• “Answer” is a 34 second snippet which occurs at
about 2:36:00 out of 2:48:00 total duration
• Soundtrack is complex at this moment; hard to pick
out even for native speakers, but the utterance
seems to be:
“earn this … earn it”
11/6/2015
15
How do we “get the answer”?
• Assumptions:
• We have the film and permission to use it.
• We have time-aligned markup of text and video
• We have the tools to handle such multimedia access
• All of these [technical] issues are still a challenge…
• …But the really tough part is still the relationship between
the language and the real world (i.e. primarily linguistics)
• Does the markup indicate the states of people -- alive, dead, or in
between (i.e. “dying”)?
• More importantly, interpreting the question seems to require
Mental Spaces (Fauconnier 1985, 1988, &c).
11/6/2015
16
Mental Spaces Required?
In reality, Tom Hanks never said anything to Private Ryan.
Real World
Movie Space
Tom Hanks
Capt. Miller
Matt Damon
Private Ryan
Conclusion: while IR may bring one within striking distance
of the “answer”, high-level NLU potentially required to
determine if you really got the right one.
11/6/2015
17
Having said all that…
Purely serendipitously, there is an IR
solution to this PARTICULAR question
(using the “encyclopedic” aspect of the
web):
Some search engines retrieve discussions
about this apparently important moment
(which in some ways is the climax of the
movie)
11/6/2015
18
Some “hard” questions are easy
• Sometimes, Big Differences are not important
[ the following two sentences have quite different syntax, but share most of
their answers in common. But note: both cannot be answered “Oh, not far!” ]
English:
What is the distance from
How far is it
from
Tokyo
Tokyo
to
to
Yokohama ?
Yokohama ?
Japanese:
東京
と
toukyou to
横浜
の
yokohama no
間の距離は、どのぐらいですか?
aida no kyori ha, ikura desu ka?
東京
から 横浜
までは、 どのぐらい離れていますか?
toukyou kara yokohama made ha, dono gurai hanarete imasu ka?
11/6/2015
19
Some “easy” questions are hard
• Small Differences may be important
•
•
•
•
“Books by kids”
“Books for kids”
[these differ only by “stopwords”]
“Books for under $20”
“Books about kids”
(Rilloff et. al (1994), Pustejovsky, Lexeme, NPR 6/2000)
In this case the “stop words” are critical
11/6/2015
20
Story Problem #1 (“conventional”)
• After Bobrow 1967 (and Dreyfus 1972, 1992)
• NB: requires you to “Show your work”! (I.e. display trace)
“Elizabeth, Brian, Dean and Leslie want to cross a bridge.
They all begin on the same side and have only 17 minutes
to get everyone across to the other side. It is night and
there is only one flashlight. A max of two people can
cross at one time. Any party who crosses, either 1 or 2
people, must have the flashlight with them. The flashlight
must be walked back and forth; it cannot be thrown. Each
student walks at a different speed - Elizabeth 1 minute,
Brian 2 minutes, Dean, 5 minutes, and Leslie 10 minutes.
A pair must walk together at the rate of the slower students
pace. How can they get everyone across in 17 minutes?”
11/6/2015
21
Story Problem #2, (“a user’s lament”)
• Typically, a customer support problem
• Often, these are not really questions…
“PLEASE HELP ME! I don't know who to ask.
I want to mail merge a specific category
from [address book] in [email program]
and all I can figure out how to do is merge the
entire […] mailing list. If you can't help me,
please tell me who can. Thank you”
11/6/2015
22
Story Problem #3, (“I’ve almost got it…”)
• Sentence punctuation is poor
• Identification and tokenization of NEs is a challenge
I cant install Age of Empires now that I 've
upgraded to win98 from Win 95 computer says
I have 1GB of hard drive space but the
installation failed after taking 30 minutes with
the words not enough hard drive space.
Should I update the drive to FAT 32 and try
again?
11/6/2015
23
Story Problem #4, (“share my misery”)
"I have SuperOS 1776 and a Hogwarts Color
999 printer. I had to reformat my computer and
now I haven't been able to find a driver to
reinstall the printer.. I've only found a driver for
SuperOS 1492 and 1789. I tried it anyway, and
big surprise, it didn't work.”
• Deep NLP is going to have some trouble here
(even people do!)
• Would IR work better?
11/6/2015
24
Story Problem #5, (“a poem”)
i want to customize my mouse and keyboard
by having mouse in 3-dimensional
and having mouse trails
i also want to slow down the cursor blink rate
am also having trouble ith the left mouse butoon
as i am left handed
which steps do i take to change these things
• The e e cummings approach to keyboard entry…
• The point: tremendous stylistic variation exist in
typography, orthography, conceptualization, and so on.
11/6/2015
25
Statistical Properties of AJ UQs
Average length of user query
~ 4.8
(on ask.com; punctuation excluded)
UQs which contain an unknown token
~ 30%
Unknown tokens which are errors
~ 60%
Queries which begin with wh-words
~ 37%
11/6/2015
26
Building REs from NEs
• …for “Britney Spears”
• Br.*n.*y +Sp.*r.*s
• How to build these “from scratch”?
• But see Brill, et. al. in this workshop for a
solution!
11/6/2015
27
Distribution of query lengths
20%
16%
Frequency
12%
!
8%
4%
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Number of tokens
11/6/2015
28
Zipf’s Law applies to user queries
•
Rank-frequency distribution of UQs
with 3600 > f > 1100 (for a day or so)
Frequency
…
3524 where can i browse lyrics?
2713 where can i find online airfare specials?
2532 is jeeves gay?
2216 how can i find someone?
2190 where can i find a reverse phone directory?
2120 where can i find information on captech?
1980 (filtered)
1934 (filtered)
1852 where can i find erotica from white shadows?
1787 where can i get driving directions between cities?
1585 am i in love?
1567 how do i make a web page?
1544 cars
1520 where can i find a reverse email directory?
1485 how do i use the internet to find a job?
1420 (filtered)
1336 where can i find the lyrics to songs by eminem?
1323 where can i listen to music online?
1318 where can i find pictures of the latest hairstyles?
1313 where can i find a metric conversion table?
1278 where can i find arcade games online?
…
1
Rank
11/6/2015
29
Conclusions
11/6/2015
30
Sobering Insights (or Nothing New)?
• The bar has been considerably raised!
“Communication is not accomplished by the exchange of symbolic
expressions. Communication is, rather, the successful
interpretation by an addressee of a speaker’s intent in performing
a linguistic act.” (Green 1996)
• Hybrid approaches will be de rigueur for many practical
applications; need to work on combining outputs from:
• Search engines
• QA systems
• Other inferencing engines (decision tree, CBR, etc.)
• We must make friends with our users (and provide
cognitively appealing UIs)
• “If you ask the same question, you get the same answer!” (a
distinctly unhuman behavior (e.g. “Where?”)
• Knowing when you don’t know: understanding failure modes of
the system and communicating this to the user
11/6/2015
31
Thank You!
11/6/2015
32