Categorization

Transcript Categorization

Università di Pisa
A Multilanguage Non-Projective
Dependency Parser
Giuseppe Attardi
Dipartimento di Informatica
Università di Pisa
Language and Intelligence
“Understanding cannot be measured by
external behavior; it is an internal metric
of how the brain remembers things and
uses its memories to make predictions”.
“The difference between the intelligence of
humans and other mammals is that we
have language”.
Jeff Hawkins, “On Intelligence”, 2004
Hawkins’ Memory-Prediction
framework

The brain uses vast amounts of
memory to create a model of the
world. Everything you know and
have learned is stored in this model.
The brain uses this memory-based
model to make continuous
predictions of future events. It is the
ability to make predictions about the
future that is the crux of intelligence.
More …
“Spoken and written words are just patterns
in the world…
The syntax and semantics of language are
not different from the hierarchical
structure of everyday objects.
We associate spoken words with our
memory of their physical and semantic
counterparts.
Through language one human can invoke
memories and create next justapositions
of mental objects in another human.”
Conclusion

Ability to process language should
be essential in many computer
applications
Why NLP is not needed in IR?

Document retrieval as primary measure of
information retrieval success
 Document retrieval reduces the need for
NLP techniques
– Discourse factors can be ignored
– Query words perform word-sense
disambiguation

Lack of robustness:
– NLP techniques are typically not as robust as
word indexing
Question Answering
Question Answering from Open-Domain Text
 Search
Engines return list of
(possibly) relevant documents
 Users still to have to dig through
returned list to find answer
 QA: give the user a (short) answer
to their question, perhaps
supported by evidence
The Google answer #1
Include question words (why, who,
etc.) in stop-list
 Do standard IR
 Sometimes this (sort of) works:

– Question: Who was the prime minister
of Australia during the Great
Depression?
– Answer: James Scullin (Labor) 1929–31
Page about Curtin (WW II
Labor Prime Minister)
(Can deduce answer)
Page about Curtin (WW II
Labor Prime Minister)
(Lacks answer)
Page about Chifley
(Labor Prime Minister)
(Can deduce answer)
But often it doesn’t…
Question: How much money did IBM
spend on advertising in 2002?
 Answer: I dunno, but I’d like to … 

The Google answer #2
 Take
the question and try to find it
as a string on the web
 Return the next sentence on that
web page as the answer
 Works brilliantly if this exact
question appears as a FAQ
question, etc.
 Works lousily most of the time
 But, wait …
AskJeeves

AskJeeves was the most hyped example of
“Question answering”
– Have basically given up now: just web search except
when there are factoid answers of the sort MSN also
does

It largely did pattern matching to match your
question to their own knowledge base of
questions
 If that works, you get the human-curated answers
to that known question
 If that fails, it falls back to regular web search
 A potentially interesting middle ground, but a
fairly weak shadow of real QA
Question Answering at TREC

Consists of answering a set of 500 factbased questions, e.g. “When was Mozart
born?”
 Systems were allowed to return 5 ranked
answer snippets to each question.
– IR think
– Mean Reciprocal Rank (MRR) scoring:
• 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc
– Mainly Named Entity answers (person, place,
date, …)

From 2002 systems are only allowed to
return a single exact answer
TREC 2000 Results (long)
0.8
0.7
0.6
0.5
0.4
MRR
0.3
0.2
0.1
sa
Pi
IC
N
TT
M
SI
LI
IB
M
o
W
at
er
lo
en
s
Q
ue
SM
U
0
Falcon
The Falcon system from SMU was by
far best performing system at TREC
2000
 It used NLP and performed deep
semantic processing

Question parse
S
VP
S
VP
PP
NP
WP
VBD DT
JJ
NNP
NP
NP
TO
VB
IN
NN
Who was the first Russian astronaut to walk in space
Question semantic form
first
Russian
astronaut
Answer
type
PERSON
walk
space
Question logic form:
first(x)  astronaut(x)  Russian(x)  space(z) 
walk(y, z, x)  PERSON(x)
TREC 2001: no NLP
Best system from Insight Software
using surface patterns
 AskMSR uses a Web Mining
approach, by retrieving suggestions
from Web searches

Insight Sofware: Surface patterns approach



Best at TREC 2001: 0.68 MRR
Use of Characteristic Phrases
“When was <person> born”
– Typical answers
• “Mozart was born in 1756.”
• “Gandhi (1869-1948)...”
– Suggests phrases (regular expressions) like
• “<NAME> was born in <BIRTHDATE>”
• “<NAME> ( <BIRTHDATE>-”
– Use of Regular Expressions can help locate
correct answer
AskMSR: Web Mining
1
2
3
5
4
Step 1: Rewrite queries

Intuition: The user’s question is often
syntactically quite close to
sentences that contain the answer
– Where is the Louvre Museum located?
– The Louvre Museum is located in Paris
– Who created the character of Scrooge?
– Charles Dickens created the character of
Scrooge.
Query rewriting

–
–
–
Classify question into seven categories
Who is/was/are/were…?
When is/did/will/are/were …?
Where is/are/were …?
a. Category-specific transformation rules
Nonsense,
eg “For Where questions, move ‘is’ to all possible but who
locations”
cares? It’s
only a few
“Where is the Louvre Museum located”
more queries

“is the Louvre Museum located”
to Google.

“the is Louvre Museum located”

“the Louvre is Museum located”

“the Louvre Museum is located”

“the Louvre Museum located is”
b. Expected answer “Datatype” (eg, Date, Person, Location, …)
When was the French Revolution?  DATE

Hand-crafted classification/rewrite/datatype rules
Step 2: Query search engine
Send all rewrites to a Web search
engine
 Retrieve top N answers
 For speed, rely just on search
engine’s “snippets”, not the full text
of the actual document

Nevertheless …
NLP Technologies are used

Question Analysis:
– identify the semantic type of the
expected answer implicit in the query

Named-Entity Detection:
– determine the semantic type of proper
nouns and numeric amounts in text
Parsing in QA
Top systems in TREC 2005 perform
parsing of queries and answer
paragraphs
 Some use specially built parser
 Parsers are slow: ~ 1min/sentence

Parsing Technology
Constituent Parsing

Requires Phrase Structure Grammar
– CFG, PCFG, Unification Grammar

Produces phrase structure parse tree
VP
S
VP
S
VP
NP
NP
NP
VP
ADJP
Rolls-Royce Inc. said it expects its sales to remain steady
Statistical Methods in NLP

Some NLP problems:
– Information extraction
• Named entities, Relationships between entities, etc.
– Finding linguistic structure
• Part-of-speech tagging, Chunking, Parsing

Can be cast as learning mapping:
– Strings to hidden state sequences
• NE extraction, POS tagging
– Strings to strings
• Machine translation
– Strings to trees
• Parsing
– Strings to relational data structures
• Information extraction
Techniques
– Log-linear (Maximum Entropy) taggers
– Probabilistic context-free grammars
(PCFGs)
– Discriminative methods:
• Conditional MRFs, Perceptron, Kernel
methods
Learning mapping

Strings to hidden state sequences
– NE extraction, POS tagging

Strings to strings
– Machine translation

Strings to trees
– Parsing

Strings to relational data structures
– Information extraction
POS as Tagging
INPUT:
Profits soared at Boeing Co., easily
topping forecasts on Wall Street.
OUTPUT:
Profits/N soared/V at/P Boeing/N Co./N
,/, easily/ADV topping/V forecasts/N
on/P Wall/N Street/N ./.
NE as Tagging
INPUT:
Profits soared at Boeing Co., easily
topping forecasts on Wall Street.
OUTPUT:
Profits/O soared/O at/O Boeing/BC
Co./IC ,/O easily/O topping/O
forecasts/O on/NA Wall/BL Street/IL
./O
Statistical Parsers

Probabilistic Generative Model of
Language which include parse
structure (e.g. Collins 1997)
– Learning consists in estimating the
parameters of the model with simple
likelihood based techniques

Conditional parsing models
(Charniak 2000; McDonald 2005)
Results
Method
Accuracy
PCFGs (Charniak 97)
73.0%
Conditional Models – Decision Trees (Magerman 95)
84.2%
Lexical Dependencies (Collins 96)
85.5%
Conditional Models – Logistic (Ratnaparkhi 97)
86.9%
Generative Lexicalized Model (Charniak 97)
86.7%
Generative Lexicalized Model (Collins 97)
88.2%
Logistic-inspired Model (Charniak 99)
89.6%
Boosting (Collins 2000)
89.8%
Linear Models for Parsing and Tagging

Three components:
GEN is a function from a string to a set of
candidates
F maps a candidate to a feature vector
W is a parameter vector
Component 1: GEN
GEN enumerates a set of candidates
for a sentence
She announced a program to promote safety
in trucks and vans
GEN
Examples of GEN
A context-free grammar
 A finite-state machine
 Top N most probable analyses from a
probabilistic grammar

Component 2: F
F maps a candidate to a feature vector
Rd
 F defines the representation of a
candidate

F
<1, 0, 2, 0, 0, 15, 5>
Feature
A “feature” is a function on a structure,
e.g.,
h(x) = Number of times A is seen
in x
B C
Feature vector:
A set of functions h1…hd define a feature
vector
F(x) = <h1(x), h2(x) … hd(x)>
Component 3: W
W is a parameter vector Rd
 F . W map a candidate to a real-valued
score

Putting it all together

X is set of sentences, Y is set of possible
outputs (e.g. trees)
 Need to learn a function F : X → Y
 GEN, F, W define
F ( x)  argmaxF( y) W
yGEN ( x )

Choose the highest scoring tree as the most
plausible structure
Constituent Parsing

Requires Grammar
– CFG, PCFG, Unification Grammar

Produces phrase structure parse tree
VP
S
VP
S
VP
NP
NP
NP
VP
ADJP
Rolls-Royce Inc. said it expects its sales to remain steady
Dependency Tree
Word-word dependency relations
 Far easier to understand and to
annotate

Rolls-Royce Inc. said it expects its sales to remain steady
Inductive Dependency Parser
Traditional statistical parsers are
trained directly on the task of tagging
a sentence
 Instead an Inductive Parser is trained
and learns the sequence of parse
actions required to build the parse
tree

Grammar Not Required
A traditional parser requires a
grammar for generating candidate
trees
 An inductive parser needs no
grammar

Parsing as Classification
Inductive dependency parsing
 Parsing based on Shift/Reduce
actions
 Learn from annotated corpus which
action to perform at each step

Right
Shift
Left
Parser Actions
top
next
Ho
VER:aux
visto
VER:pper
una
DET
ragazza
NOM
con
PRE
gli
DET
occhiali
NOM
.
POS
Dependency Graph
Let R = {r1, … , rm} be the set of permissible
dependency types
A dependency graph for a string of words
W = w1 … wn is a labeled directed graph
D = (W, A), where
(a) W is the set of nodes, i.e. word tokens in
the input string,
(b) A is a set of labeled arcs (wi, r, wj),
wi, wj  W, r  R,
(c)  wj  W, there is at most one arc
(wi, r, wj)  A.
Parser State
The parser state is a quadruple
S, I, T, A, where
S is a stack of partially processed tokens
I is a list of (remaining) input tokens
T is a stack of temporary tokens
A is the arc relation for the dependency
graph
(w, r, h)  A represents an arc w → h,
tagged with dependency r
Parser Actions
Shift
Right
Left
S, n|I, T, A
n|S, I, T, A
s|S, n|I, T, A
S, n|I, T, A{(s, r, n)}
s|S, n|I, T, A
S, s|I, T, A{(n, r, s)}
Parser Algorithm

The parsing algorithm is fully
deterministic and works as follows:
Input Sentence: (w1, p1), (w2, p2), … , (wn,
pn)
S = <>
T = <(w1, p1), (w2, p2), … , (wn, pn)>
L = <>
while T != <> do begin
x = getContext(S, T, L);
y = estimateAction(model, x);
performAction(y, S, T, L);
end
Learning Phase
Learning Features
feature
Value
W
word
L
lemma
P
part of speech (POS) tag
M
morphology: e.g. singular/plural
W<
word of the leftmost child node
L<
lemma of the leftmost child node
P<
POS tag of the leftmost child node, if present
M<
whether the rightmost child node is singular/plural
W>
word of the rightmost child node
L>
lemma of the rightmost child node
P>
POS tag of the rightmost child node, if present
M>
whether the rightmost child node is singular/plural
Learning Event
left context
Sosteneva
VER
che
PRO
target nodes
leggi
NOM
le
DET
context
anti
ADV
Serbia
NOM
right context
che
PRO
,
PON
erano
VER
discusse
ADJ
(-3, W, che), (-3, P, PRO),
(-2, W, leggi), (-2, P, NOM), (-2, M, P), (-2, W<, le), (-2, P<, DET), (-2, M<, P),
(-1, W, anti), (-1, P, ADV),
(0, W, Serbia), (0, P, NOM), (0, M, S),
(+1, W, che), ( +1, P, PRO), (+1, W>, erano), (+1, P>, VER), (+1, M>, P),
(+2, W, ,), (+2, P, PON)
Parser Architecture

Modular learners architecture:
– MaxEntropy, MBL, SVM, Winnow,
Perceptron

Features can be selected
Feature used in Experiments
LemmaFeatures
PosFeatures
MorphoFeatures
DepFeatures
PosLeftChildren
PosLeftChild
DepLeftChild
PosRightChildren
PosRightChild
DepRightChild
PastActions
-2 -1 0 1 2 3
-2 -1 0 1 2 3
-1 0 1 2
-1 0
2
-1 0
-1 0
2
-1 0
-1
1
Projectivity
An arc wi→wk is projective iff
j, i < j < k or i > j > k,
wi →* wk
 A dependency tree is projective iff
every arc is projective
 Intuitively: arcs can be drawn on a
plane without intersections

Non Projective
Většinu těchto přístrojů lze take používat nejen jako fax , ale
Actions for non-projective arcs
Right2
Left2
Right3
Left3
Extract
Insert
s1|s2|S, n|I, T, A
s1|S, n|I, T, A{(s2, r, n)}
s1|s2|S, n|I, T, A
s2|S, s1|I, T, A{(n, r, s2)}
s1|s2|s3|S, n|I, T, A
s1|s2|S, n|I, T, A{(s3, r, n)}
s1|s2|s3|S, n|I, T, A
s2|s3|S, s1|I, T, A{(n, r, s3)}
s1|s2|S, n|I, T, A
n|s1|S, I, s2|T, A
S, I, s1|T, A
s1|S, I, T, A
Example
Většinu těchto přístrojů lze take používat nejen jako fax , ale

Right2 (nejen → ale) and Left3 (fax →
Většinu)
Examples
zou gemaakt moeten worden in
Extract followed by Insert
zou moeten worden gemaakt in
Experiments


three classifiers: one to decide
between Shift/Reduce, one to
decide which Reduce action and a
third one to chose the dependency
in case of Left/Right action
two classifiers: one to decide which
action to perform and a second one
to chose the dependency in case of
Left/Right action
CoNLL-X Shared Task

To assign labeled dependency structures
for a range of languages by means of a
fully automatic dependency parser
 Input: tokenized and tagged sentences
 Tags: token, lemma, POS, morpho
features, ref. to head, dependency label
 For each token, the parser must output its
head and the corresponding dependency
relation
CoNLL-X: Data Format
N WORD
LEMMA
1 A
o
2 direcção
direcção
3 já
já
4 mostrou
mostrar
5 boa_vontade boa_vontade
6 ,
,
7 mas
mas
8 a
o
9 greve
greve
10 prossegue prosseguir
11 em
em
12 todas_as
todo_o
13 delegações delegaçõo
14 de
de
15 o
o
16 país
país
17 .
.
CPOS POS
FEATS
HEAD DEPREL PHEAD PDEPREL
art
n
adv
v
n
punc
conj
art
n
v
prp
pron
n
prp
art
n
punc
<artd>|F|S
F|S
_
PS|3S|IND
F|S
_
<co-vfin>|<co-fmc>
<artd>|F|S
F|S
PR|3S|IND
_
<quant>|F|P
F|P
<sam->
<-sam>|<artd>|M|S
M|S
_
2
4
4
0
4
4
4
9
10
4
10
13
11
13
16
14
4
art
n
adv
v-fin
n
punc
conj-c
art
n
v-fin
prp
pron-det
n
prp
art
n
punc
>N
SUBJ
ADVL
STA
ACC
PUNC
CO
>N
SUBJ
CJT
ADVL
>N
P<
N<
>N
P<
PUNC
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
CoNLL-X: Languages
The same parser should handle all
languages
 13 languages:

– Arabic, Bulgaria, Chinese, Czech,
Danish, Dutch, Japanese, German,
Portuguese, Slovene, Spanish, Swedish,
Turkish
CoNLL-X: Collections
Ar
Cn
Cz
Dk
Du
De
Jp
Pt
Sl
Sp
Se
Tr
Bu
K tokens
54
337
1,249
94
195
700
151
207
29
89
191
58
190
K sents
1.5
57.0
72.7
5.2
13.3
39.2
17.0
9.1
1.5
3.3
11.0
5.0
12.8
Tokens/sentence
37.2
5.9
17.2
18.2
14.6
17.8
8.9
22.8
18.7
27.0
17.3
11.5
14.8
CPOSTAG
14
22
12
10
13
52
20
15
11
15
37
14
11
POSTAG
19
303
63
24
302
52
77
21
28
38
37
30
53
FEATS
19
0
61
47
81
0
4
146
51
33
0
82
50
DEPREL
27
82
78
52
26
46
7
55
25
21
56
25
18
% non-project.
relations
0.4
0.0
1.9
1.0
5.4
2.3
1.1
1.3
1.9
0.1
1.0
1.5
0.4
% non-project.
sentences
11.2
0.0
23.2
15.6
36.4
27.8
5.3
18.9
22.2
1.7
9.8
11.6
5.4
CoNLL: Evaluation Metrics

Labeled Attachment Score (LAS)
– proportion of “scoring” tokens that are
assigned both the correct head and the
correct dependency relation label

Unlabeled Attachment Score (UAS)
– proportion of “scoring” tokens that are
assigned the correct head
CoNLL-X Shared Task Results
Maximum Entropy
Language
LAS
%
UAS
%
Train
sec
Arabic
56.43
70.96
181
Bulgarian
81.15
86.71
Chinese
81.19
Czech
MBL
Parse
sec
LAS
%
UAS
%
Train
sec
Parse
sec
2.6
59.70
74.69
24
950
452
1.5
79.17
85.92
88
353
86.10
1,156
1.8
72.17
83.08
540
478
62.10
73.44
13,800
12.8
69.20
80.22
496
13,500
Danish
75.25
80.96
386
3.2
76.13
83.65
52
627
Dutch
67.79
72.71
679
3.3
68.97
74.73
132
923
Japanese
84.17
87.15
129
0.8
83.39
86.73
44
97
German
75.88
80.25
9,315
4.3
79.79
84.31
1,399
3,756
Portuguese
79.40
87.58
1,044
4.9
80.97
87.74
160
670
Slovene
61.97
73.18
98
3.0
62.67
76.60
16
547
Spanish
72.35
76.06
204
2.4
74.37
79.70
54
769
Swedish
75.20
83.03
1,424
2.9
74.85
83.73
96
1,177
Turkish
49.27
65.29
177
2.3
47.58
65.25
43
727
CoNLL-X: Overall Results
Arabic
Bulgarian
Chinese
Czech
Danish
Dutch
Japanese
German
Portuguese
LAS
Average
59.94
79.98
78.32
67.17
78.31
70.73
85.86
78.58
Ours
59.70
81.15
81.19
69.20
76.13
68.97
84.17
79.79
UAS
Average
Ours
74.69
73.48
86.71
85.89
86.10
84.85
80.22
77.01
83.65
84.52
74.73
75.07
87.15
89.05
84.31
82.60
Slovene
80.63
65.16
80.97
62.67
86.46
76.53
87.74
76.60
Spanish
73.52
74.37
77.76
79.70
Swedish
76.44
55.95
74.85
49.27
84.21
69.35
83.73
65.29
Turkish
Average
scores from
36 participant
submissions
Well-formed Parse Tree
A graph D = (W, A) is well-formed iff it
is acyclic, projective and connected
Multiple Heads

Examples include:
– verb coordination in which the subject
or object is an argument of several
verbs
– relative clauses in which words must
satisfy dependencies both inside and
outside the clause
Examples
He designs and develops programs
Il governo garantirà sussidi a coloro che cercheranno lavoro
Solution
He designs and develops programs
N<PRED
SUBJ
ACC
SUBJ
ACC
Il governo garantirà sussidi a coloro che cercheranno lavoro
Italian Treebank
Using SI-TAL collection from CNR
ILC
 Annotations split into separate
morpho & functional files
 Not all tokens have relations, some
have more than one, no accents, …
 Implemented some heuristics to
generate an corpus in CoNLL format
 Tool for visualization and annotation

DgAnnotator

A GUI tool for:
–
–
–
–

Annotating texts with dependency relations
Visualizing and comparing trees
Generating corpora in XML or CoNLL format
Exporting DG trees to PNG
Demo
 Available at:
http://medialab.di.unipi.it/Project/QA/Parse
r/DgAnnotator/
Future Directions

Opinion Extraction
– Finding opinions (positive/negative)
– Blog track in TREC2006

Intent Analysis
– Determine author intent, such as:
problem (description, solution),
agreement (assent, dissent), preference
(likes, dislikes), statement (claim,
denial)
References

G. Attardi. 2006. Experiments with a
Multilanguage Non-projective Dependency
Parser. In Proc. CoNLL-X.
 H. Yamada, Y. Matsumoto. 2003. Statistical
Dependency Analysis with Support Vector
Machines. In Proc. IWPT.
 M. T. Kromann. 2001. Optimality parsing
and local cost functions in discontinuous
grammars. In Proc. FG-MOL.

Categorization

Transcript Categorization

Directory