Categorization
Download
Report
Transcript Categorization
Università di Pisa
A Multilanguage Non-Projective
Dependency Parser
Giuseppe Attardi
Dipartimento di Informatica
Università di Pisa
Language and Intelligence
“Understanding cannot be measured by
external behavior; it is an internal metric
of how the brain remembers things and
uses its memories to make predictions”.
“The difference between the intelligence of
humans and other mammals is that we
have language”.
Jeff Hawkins, “On Intelligence”, 2004
Hawkins’ Memory-Prediction
framework
The brain uses vast amounts of
memory to create a model of the
world. Everything you know and
have learned is stored in this model.
The brain uses this memory-based
model to make continuous
predictions of future events. It is the
ability to make predictions about the
future that is the crux of intelligence.
More …
“Spoken and written words are just patterns
in the world…
The syntax and semantics of language are
not different from the hierarchical
structure of everyday objects.
We associate spoken words with our
memory of their physical and semantic
counterparts.
Through language one human can invoke
memories and create next justapositions
of mental objects in another human.”
Conclusion
Ability to process language should
be essential in many computer
applications
Why NLP is not needed in IR?
Document retrieval as primary measure of
information retrieval success
Document retrieval reduces the need for
NLP techniques
– Discourse factors can be ignored
– Query words perform word-sense
disambiguation
Lack of robustness:
– NLP techniques are typically not as robust as
word indexing
Question Answering
Question Answering from Open-Domain Text
Search
Engines return list of
(possibly) relevant documents
Users still to have to dig through
returned list to find answer
QA: give the user a (short) answer
to their question, perhaps
supported by evidence
The Google answer #1
Include question words (why, who,
etc.) in stop-list
Do standard IR
Sometimes this (sort of) works:
– Question: Who was the prime minister
of Australia during the Great
Depression?
– Answer: James Scullin (Labor) 1929–31
Page about Curtin (WW II
Labor Prime Minister)
(Can deduce answer)
Page about Curtin (WW II
Labor Prime Minister)
(Lacks answer)
Page about Chifley
(Labor Prime Minister)
(Can deduce answer)
But often it doesn’t…
Question: How much money did IBM
spend on advertising in 2002?
Answer: I dunno, but I’d like to …
The Google answer #2
Take
the question and try to find it
as a string on the web
Return the next sentence on that
web page as the answer
Works brilliantly if this exact
question appears as a FAQ
question, etc.
Works lousily most of the time
But, wait …
AskJeeves
AskJeeves was the most hyped example of
“Question answering”
– Have basically given up now: just web search except
when there are factoid answers of the sort MSN also
does
It largely did pattern matching to match your
question to their own knowledge base of
questions
If that works, you get the human-curated answers
to that known question
If that fails, it falls back to regular web search
A potentially interesting middle ground, but a
fairly weak shadow of real QA
Question Answering at TREC
Consists of answering a set of 500 factbased questions, e.g. “When was Mozart
born?”
Systems were allowed to return 5 ranked
answer snippets to each question.
– IR think
– Mean Reciprocal Rank (MRR) scoring:
• 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc
– Mainly Named Entity answers (person, place,
date, …)
From 2002 systems are only allowed to
return a single exact answer
TREC 2000 Results (long)
0.8
0.7
0.6
0.5
0.4
MRR
0.3
0.2
0.1
sa
Pi
IC
N
TT
M
SI
LI
IB
M
o
W
at
er
lo
en
s
Q
ue
SM
U
0
Falcon
The Falcon system from SMU was by
far best performing system at TREC
2000
It used NLP and performed deep
semantic processing
Question parse
S
VP
S
VP
PP
NP
WP
VBD DT
JJ
NNP
NP
NP
TO
VB
IN
NN
Who was the first Russian astronaut to walk in space
Question semantic form
first
Russian
astronaut
Answer
type
PERSON
walk
space
Question logic form:
first(x) astronaut(x) Russian(x) space(z)
walk(y, z, x) PERSON(x)
TREC 2001: no NLP
Best system from Insight Software
using surface patterns
AskMSR uses a Web Mining
approach, by retrieving suggestions
from Web searches
Insight Sofware: Surface patterns approach
Best at TREC 2001: 0.68 MRR
Use of Characteristic Phrases
“When was <person> born”
– Typical answers
• “Mozart was born in 1756.”
• “Gandhi (1869-1948)...”
– Suggests phrases (regular expressions) like
• “<NAME> was born in <BIRTHDATE>”
• “<NAME> ( <BIRTHDATE>-”
– Use of Regular Expressions can help locate
correct answer
AskMSR: Web Mining
1
2
3
5
4
Step 1: Rewrite queries
Intuition: The user’s question is often
syntactically quite close to
sentences that contain the answer
– Where is the Louvre Museum located?
– The Louvre Museum is located in Paris
– Who created the character of Scrooge?
– Charles Dickens created the character of
Scrooge.
Query rewriting
–
–
–
Classify question into seven categories
Who is/was/are/were…?
When is/did/will/are/were …?
Where is/are/were …?
a. Category-specific transformation rules
Nonsense,
eg “For Where questions, move ‘is’ to all possible but who
locations”
cares? It’s
only a few
“Where is the Louvre Museum located”
more queries
“is the Louvre Museum located”
to Google.
“the is Louvre Museum located”
“the Louvre is Museum located”
“the Louvre Museum is located”
“the Louvre Museum located is”
b. Expected answer “Datatype” (eg, Date, Person, Location, …)
When was the French Revolution? DATE
Hand-crafted classification/rewrite/datatype rules
Step 2: Query search engine
Send all rewrites to a Web search
engine
Retrieve top N answers
For speed, rely just on search
engine’s “snippets”, not the full text
of the actual document
Nevertheless …
NLP Technologies are used
Question Analysis:
– identify the semantic type of the
expected answer implicit in the query
Named-Entity Detection:
– determine the semantic type of proper
nouns and numeric amounts in text
Parsing in QA
Top systems in TREC 2005 perform
parsing of queries and answer
paragraphs
Some use specially built parser
Parsers are slow: ~ 1min/sentence
Parsing Technology
Constituent Parsing
Requires Phrase Structure Grammar
– CFG, PCFG, Unification Grammar
Produces phrase structure parse tree
VP
S
VP
S
VP
NP
NP
NP
VP
ADJP
Rolls-Royce Inc. said it expects its sales to remain steady
Statistical Methods in NLP
Some NLP problems:
– Information extraction
• Named entities, Relationships between entities, etc.
– Finding linguistic structure
• Part-of-speech tagging, Chunking, Parsing
Can be cast as learning mapping:
– Strings to hidden state sequences
• NE extraction, POS tagging
– Strings to strings
• Machine translation
– Strings to trees
• Parsing
– Strings to relational data structures
• Information extraction
Techniques
– Log-linear (Maximum Entropy) taggers
– Probabilistic context-free grammars
(PCFGs)
– Discriminative methods:
• Conditional MRFs, Perceptron, Kernel
methods
Learning mapping
Strings to hidden state sequences
– NE extraction, POS tagging
Strings to strings
– Machine translation
Strings to trees
– Parsing
Strings to relational data structures
– Information extraction
POS as Tagging
INPUT:
Profits soared at Boeing Co., easily
topping forecasts on Wall Street.
OUTPUT:
Profits/N soared/V at/P Boeing/N Co./N
,/, easily/ADV topping/V forecasts/N
on/P Wall/N Street/N ./.
NE as Tagging
INPUT:
Profits soared at Boeing Co., easily
topping forecasts on Wall Street.
OUTPUT:
Profits/O soared/O at/O Boeing/BC
Co./IC ,/O easily/O topping/O
forecasts/O on/NA Wall/BL Street/IL
./O
Statistical Parsers
Probabilistic Generative Model of
Language which include parse
structure (e.g. Collins 1997)
– Learning consists in estimating the
parameters of the model with simple
likelihood based techniques
Conditional parsing models
(Charniak 2000; McDonald 2005)
Results
Method
Accuracy
PCFGs (Charniak 97)
73.0%
Conditional Models – Decision Trees (Magerman 95)
84.2%
Lexical Dependencies (Collins 96)
85.5%
Conditional Models – Logistic (Ratnaparkhi 97)
86.9%
Generative Lexicalized Model (Charniak 97)
86.7%
Generative Lexicalized Model (Collins 97)
88.2%
Logistic-inspired Model (Charniak 99)
89.6%
Boosting (Collins 2000)
89.8%
Linear Models for Parsing and Tagging
Three components:
GEN is a function from a string to a set of
candidates
F maps a candidate to a feature vector
W is a parameter vector
Component 1: GEN
GEN enumerates a set of candidates
for a sentence
She announced a program to promote safety
in trucks and vans
GEN
Examples of GEN
A context-free grammar
A finite-state machine
Top N most probable analyses from a
probabilistic grammar
Component 2: F
F maps a candidate to a feature vector
Rd
F defines the representation of a
candidate
F
<1, 0, 2, 0, 0, 15, 5>
Feature
A “feature” is a function on a structure,
e.g.,
h(x) = Number of times A is seen
in x
B C
Feature vector:
A set of functions h1…hd define a feature
vector
F(x) = <h1(x), h2(x) … hd(x)>
Component 3: W
W is a parameter vector Rd
F . W map a candidate to a real-valued
score
Putting it all together
X is set of sentences, Y is set of possible
outputs (e.g. trees)
Need to learn a function F : X → Y
GEN, F, W define
F ( x) argmaxF( y) W
yGEN ( x )
Choose the highest scoring tree as the most
plausible structure
Constituent Parsing
Requires Grammar
– CFG, PCFG, Unification Grammar
Produces phrase structure parse tree
VP
S
VP
S
VP
NP
NP
NP
VP
ADJP
Rolls-Royce Inc. said it expects its sales to remain steady
Dependency Tree
Word-word dependency relations
Far easier to understand and to
annotate
Rolls-Royce Inc. said it expects its sales to remain steady
Inductive Dependency Parser
Traditional statistical parsers are
trained directly on the task of tagging
a sentence
Instead an Inductive Parser is trained
and learns the sequence of parse
actions required to build the parse
tree
Grammar Not Required
A traditional parser requires a
grammar for generating candidate
trees
An inductive parser needs no
grammar
Parsing as Classification
Inductive dependency parsing
Parsing based on Shift/Reduce
actions
Learn from annotated corpus which
action to perform at each step
Right
Shift
Left
Parser Actions
top
next
Ho
VER:aux
visto
VER:pper
una
DET
ragazza
NOM
con
PRE
gli
DET
occhiali
NOM
.
POS
Dependency Graph
Let R = {r1, … , rm} be the set of permissible
dependency types
A dependency graph for a string of words
W = w1 … wn is a labeled directed graph
D = (W, A), where
(a) W is the set of nodes, i.e. word tokens in
the input string,
(b) A is a set of labeled arcs (wi, r, wj),
wi, wj W, r R,
(c) wj W, there is at most one arc
(wi, r, wj) A.
Parser State
The parser state is a quadruple
S, I, T, A, where
S is a stack of partially processed tokens
I is a list of (remaining) input tokens
T is a stack of temporary tokens
A is the arc relation for the dependency
graph
(w, r, h) A represents an arc w → h,
tagged with dependency r
Parser Actions
Shift
Right
Left
S, n|I, T, A
n|S, I, T, A
s|S, n|I, T, A
S, n|I, T, A{(s, r, n)}
s|S, n|I, T, A
S, s|I, T, A{(n, r, s)}
Parser Algorithm
The parsing algorithm is fully
deterministic and works as follows:
Input Sentence: (w1, p1), (w2, p2), … , (wn,
pn)
S = <>
T = <(w1, p1), (w2, p2), … , (wn, pn)>
L = <>
while T != <> do begin
x = getContext(S, T, L);
y = estimateAction(model, x);
performAction(y, S, T, L);
end
Learning Phase
Learning Features
feature
Value
W
word
L
lemma
P
part of speech (POS) tag
M
morphology: e.g. singular/plural
W<
word of the leftmost child node
L<
lemma of the leftmost child node
P<
POS tag of the leftmost child node, if present
M<
whether the rightmost child node is singular/plural
W>
word of the rightmost child node
L>
lemma of the rightmost child node
P>
POS tag of the rightmost child node, if present
M>
whether the rightmost child node is singular/plural
Learning Event
left context
Sosteneva
VER
che
PRO
target nodes
leggi
NOM
le
DET
context
anti
ADV
Serbia
NOM
right context
che
PRO
,
PON
erano
VER
discusse
ADJ
(-3, W, che), (-3, P, PRO),
(-2, W, leggi), (-2, P, NOM), (-2, M, P), (-2, W<, le), (-2, P<, DET), (-2, M<, P),
(-1, W, anti), (-1, P, ADV),
(0, W, Serbia), (0, P, NOM), (0, M, S),
(+1, W, che), ( +1, P, PRO), (+1, W>, erano), (+1, P>, VER), (+1, M>, P),
(+2, W, ,), (+2, P, PON)
Parser Architecture
Modular learners architecture:
– MaxEntropy, MBL, SVM, Winnow,
Perceptron
Features can be selected
Feature used in Experiments
LemmaFeatures
PosFeatures
MorphoFeatures
DepFeatures
PosLeftChildren
PosLeftChild
DepLeftChild
PosRightChildren
PosRightChild
DepRightChild
PastActions
-2 -1 0 1 2 3
-2 -1 0 1 2 3
-1 0 1 2
-1 0
2
-1 0
-1 0
2
-1 0
-1
1
Projectivity
An arc wi→wk is projective iff
j, i < j < k or i > j > k,
wi →* wk
A dependency tree is projective iff
every arc is projective
Intuitively: arcs can be drawn on a
plane without intersections
Non Projective
Většinu těchto přístrojů lze take používat nejen jako fax , ale
Actions for non-projective arcs
Right2
Left2
Right3
Left3
Extract
Insert
s1|s2|S, n|I, T, A
s1|S, n|I, T, A{(s2, r, n)}
s1|s2|S, n|I, T, A
s2|S, s1|I, T, A{(n, r, s2)}
s1|s2|s3|S, n|I, T, A
s1|s2|S, n|I, T, A{(s3, r, n)}
s1|s2|s3|S, n|I, T, A
s2|s3|S, s1|I, T, A{(n, r, s3)}
s1|s2|S, n|I, T, A
n|s1|S, I, s2|T, A
S, I, s1|T, A
s1|S, I, T, A
Example
Většinu těchto přístrojů lze take používat nejen jako fax , ale
Right2 (nejen → ale) and Left3 (fax →
Většinu)
Examples
zou gemaakt moeten worden in
Extract followed by Insert
zou moeten worden gemaakt in
Experiments
three classifiers: one to decide
between Shift/Reduce, one to
decide which Reduce action and a
third one to chose the dependency
in case of Left/Right action
two classifiers: one to decide which
action to perform and a second one
to chose the dependency in case of
Left/Right action
CoNLL-X Shared Task
To assign labeled dependency structures
for a range of languages by means of a
fully automatic dependency parser
Input: tokenized and tagged sentences
Tags: token, lemma, POS, morpho
features, ref. to head, dependency label
For each token, the parser must output its
head and the corresponding dependency
relation
CoNLL-X: Data Format
N WORD
LEMMA
1 A
o
2 direcção
direcção
3 já
já
4 mostrou
mostrar
5 boa_vontade boa_vontade
6 ,
,
7 mas
mas
8 a
o
9 greve
greve
10 prossegue prosseguir
11 em
em
12 todas_as
todo_o
13 delegações delegaçõo
14 de
de
15 o
o
16 país
país
17 .
.
CPOS POS
FEATS
HEAD DEPREL PHEAD PDEPREL
art
n
adv
v
n
punc
conj
art
n
v
prp
pron
n
prp
art
n
punc
<artd>|F|S
F|S
_
PS|3S|IND
F|S
_
<co-vfin>|<co-fmc>
<artd>|F|S
F|S
PR|3S|IND
_
<quant>|F|P
F|P
<sam->
<-sam>|<artd>|M|S
M|S
_
2
4
4
0
4
4
4
9
10
4
10
13
11
13
16
14
4
art
n
adv
v-fin
n
punc
conj-c
art
n
v-fin
prp
pron-det
n
prp
art
n
punc
>N
SUBJ
ADVL
STA
ACC
PUNC
CO
>N
SUBJ
CJT
ADVL
>N
P<
N<
>N
P<
PUNC
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
CoNLL-X: Languages
The same parser should handle all
languages
13 languages:
– Arabic, Bulgaria, Chinese, Czech,
Danish, Dutch, Japanese, German,
Portuguese, Slovene, Spanish, Swedish,
Turkish
CoNLL-X: Collections
Ar
Cn
Cz
Dk
Du
De
Jp
Pt
Sl
Sp
Se
Tr
Bu
K tokens
54
337
1,249
94
195
700
151
207
29
89
191
58
190
K sents
1.5
57.0
72.7
5.2
13.3
39.2
17.0
9.1
1.5
3.3
11.0
5.0
12.8
Tokens/sentence
37.2
5.9
17.2
18.2
14.6
17.8
8.9
22.8
18.7
27.0
17.3
11.5
14.8
CPOSTAG
14
22
12
10
13
52
20
15
11
15
37
14
11
POSTAG
19
303
63
24
302
52
77
21
28
38
37
30
53
FEATS
19
0
61
47
81
0
4
146
51
33
0
82
50
DEPREL
27
82
78
52
26
46
7
55
25
21
56
25
18
% non-project.
relations
0.4
0.0
1.9
1.0
5.4
2.3
1.1
1.3
1.9
0.1
1.0
1.5
0.4
% non-project.
sentences
11.2
0.0
23.2
15.6
36.4
27.8
5.3
18.9
22.2
1.7
9.8
11.6
5.4
CoNLL: Evaluation Metrics
Labeled Attachment Score (LAS)
– proportion of “scoring” tokens that are
assigned both the correct head and the
correct dependency relation label
Unlabeled Attachment Score (UAS)
– proportion of “scoring” tokens that are
assigned the correct head
CoNLL-X Shared Task Results
Maximum Entropy
Language
LAS
%
UAS
%
Train
sec
Arabic
56.43
70.96
181
Bulgarian
81.15
86.71
Chinese
81.19
Czech
MBL
Parse
sec
LAS
%
UAS
%
Train
sec
Parse
sec
2.6
59.70
74.69
24
950
452
1.5
79.17
85.92
88
353
86.10
1,156
1.8
72.17
83.08
540
478
62.10
73.44
13,800
12.8
69.20
80.22
496
13,500
Danish
75.25
80.96
386
3.2
76.13
83.65
52
627
Dutch
67.79
72.71
679
3.3
68.97
74.73
132
923
Japanese
84.17
87.15
129
0.8
83.39
86.73
44
97
German
75.88
80.25
9,315
4.3
79.79
84.31
1,399
3,756
Portuguese
79.40
87.58
1,044
4.9
80.97
87.74
160
670
Slovene
61.97
73.18
98
3.0
62.67
76.60
16
547
Spanish
72.35
76.06
204
2.4
74.37
79.70
54
769
Swedish
75.20
83.03
1,424
2.9
74.85
83.73
96
1,177
Turkish
49.27
65.29
177
2.3
47.58
65.25
43
727
CoNLL-X: Overall Results
Arabic
Bulgarian
Chinese
Czech
Danish
Dutch
Japanese
German
Portuguese
LAS
Average
59.94
79.98
78.32
67.17
78.31
70.73
85.86
78.58
Ours
59.70
81.15
81.19
69.20
76.13
68.97
84.17
79.79
UAS
Average
Ours
74.69
73.48
86.71
85.89
86.10
84.85
80.22
77.01
83.65
84.52
74.73
75.07
87.15
89.05
84.31
82.60
Slovene
80.63
65.16
80.97
62.67
86.46
76.53
87.74
76.60
Spanish
73.52
74.37
77.76
79.70
Swedish
76.44
55.95
74.85
49.27
84.21
69.35
83.73
65.29
Turkish
Average
scores from
36 participant
submissions
Well-formed Parse Tree
A graph D = (W, A) is well-formed iff it
is acyclic, projective and connected
Multiple Heads
Examples include:
– verb coordination in which the subject
or object is an argument of several
verbs
– relative clauses in which words must
satisfy dependencies both inside and
outside the clause
Examples
He designs and develops programs
Il governo garantirà sussidi a coloro che cercheranno lavoro
Solution
He designs and develops programs
N<PRED
SUBJ
ACC
SUBJ
ACC
Il governo garantirà sussidi a coloro che cercheranno lavoro
Italian Treebank
Using SI-TAL collection from CNR
ILC
Annotations split into separate
morpho & functional files
Not all tokens have relations, some
have more than one, no accents, …
Implemented some heuristics to
generate an corpus in CoNLL format
Tool for visualization and annotation
DgAnnotator
A GUI tool for:
–
–
–
–
Annotating texts with dependency relations
Visualizing and comparing trees
Generating corpora in XML or CoNLL format
Exporting DG trees to PNG
Demo
Available at:
http://medialab.di.unipi.it/Project/QA/Parse
r/DgAnnotator/
Future Directions
Opinion Extraction
– Finding opinions (positive/negative)
– Blog track in TREC2006
Intent Analysis
– Determine author intent, such as:
problem (description, solution),
agreement (assent, dissent), preference
(likes, dislikes), statement (claim,
denial)
References
G. Attardi. 2006. Experiments with a
Multilanguage Non-projective Dependency
Parser. In Proc. CoNLL-X.
H. Yamada, Y. Matsumoto. 2003. Statistical
Dependency Analysis with Support Vector
Machines. In Proc. IWPT.
M. T. Kromann. 2001. Optimality parsing
and local cost functions in discontinuous
grammars. In Proc. FG-MOL.