Overview of Multilingual Question Answering 2008

Transcript Overview of Multilingual Question Answering 2008

CLEF 2008
Multilingual Question Answering Track
UNED
Anselmo Peñas
Valentín Sama
Álvaro Rodrigo
CELCT
Danilo Giampiccolo
Pamela Forner
2
QA 2008 Task and Exercises

QA Main task (6th edition)
 Pilot: QA WSD, English newswire collections with
Word Sense Disambiguation

Answer Validation Exercise – AVE (3rd edition)

QA on Speech Transcripts – QAST (2nd edition)
3
Main Task QA 2008
Organizing Committee








♦
♦
♦
CELCT (D. Giampiccolo, P. Forner): Italian
UNED (A. Peñas): Spanish
U. Groeningen (G. Bosma): Dutch
U. Limerick (R. Sutcliff): English
DFKI (B. Sacalenau): German
ELDA/ELRA (N. Moreau): French
Linguateca (P. Rocha): Portuguese
Bulgarian Academy of Sciences (P. Osenova): Bulgarian
IASI (C. Forascu): Romanian
U. Basque Country (I. Alegria): Basque
ILSP (P.Prokopidis): Greek
4
Evolution of the Track
Target
languages
Collections
Type of
questions
Supporting
information
Pilots and
Exercises
2003
2004
2005
2006
2007
2008
3
7
8
9
10
11
News 1994
200 Factoid
+News 1995
+ Temporal
restrictions
- Type of
question
+ Definitions
+ Lists
Doc.
Temporal
restrictions
Lists
+Wikipedia Nov. 2006
+ Linked questions
+ Closed lists
Snippet
AVE
Real Time
WiQA
AVE
QAST
AVE
QAST
WSDQA
5
200 questions


FACTOID
 (loc, mea, org, per, tim, cnt, obj , oth)
DEFINITION


CLOSED LIST





♦
Who were the components of The Beatles?
Who were the last three presidents of Italy?
LINKED QUESTIONS

♦
(per, org, obj, oth)
Who was called the “Iron-Chancellor”?
When was he born?
Who was his first wife?
Temporal restrictions by date, by period, by event
NIL questions (without known answer in the collection)
6
43 Activated Language Combinations
(at least one registered participant)
TARGET LANGUAGES (corpus and answers)
BG
BG
DE
SOURCE LANGUAGES (questions)
EL
EN
ES
EU
FR
IT
NL
PT
RO
DE
EL
EN
ES
EU
FR
IT
NL
PT
RO
77
Activated Tasks
MONOLINGUAL
CROSS-LINGUAL
TOTAL
CLEF 2003
3
5
8
CLEF 2004
6
13
19
CLEF 2005
8
15
23
7
17
24
CLEF 2007
8
29
37
CLEF 2008
10
33
43
CLEF 2006
8
Submitted runs
Submitted runs
Monolingual
Cross-lingual
CLEF 2003
17
6
11
CLEF 2004
48 (+182%)
20
28
CLEF 2005
67 (+40%)
43
24
CLEF 2006
77 (+15%)
42
35
CLEF 2007
37 (-52%)
20
17
CLEF 2008
51 (+38%)
31
20
8
9
Participant groups
Newcomers
Veterans
TOTAL
Registered
CLEF 2003
-
-
8
-
CLEF 2004
13
5
18
(+125%)
22
CLEF 2005
9
15
24
(+33%)
27
CLEF 2006
10
20
30
(+25%)
36
CLEF 2007
8
14
22
(-26%)
29
CLEF 2008
8
13
21
33
10
List of Participants (random order)
Bulgaria
11
Groups per year and target collection
45
40
35
30
25
20
15
10
5
0
2003
Natural
selection?
Task
Change
2004
2005
2006
2007
2008
Above 20 groups
Greek
Finnish
French
Spanish
English
Italian
Ducth
Bulgarian
Basque
Romanian
German
Portuguese
12
Groups per target collection
10
9
8
7
6
5
4
3
2
1
0
2003
2004
2005
2006
2007
2008
English
Spanish
French
Portuguese
German
Romanian
Italian
Bulgarian
Ducth
Basque
Finnish
Greek
13
2008 participation: Comparative evaluation?
Runs
Different
groups
9
6
Spanish
10
4
English
5
4
German
11
3
Romanian
4
2
Dutch
4
1
Basque
4
1
French
3
1
Bulgarian
1
1
Italian
0
0
Greek
0
0
Language
Portuguese
Lack from evaluation
perspective:
4 languages without
comparison
between different
groups
Breakout session
14
Results: Best and Average scores
80,0
70,0
64,5
69,0
63,5
60,0
50,0
41,5
40,0
30,0
49,5
45,5
39,5
41,8
35,0
35,0
29,4
29,0
23,7
20,0
17,0
10,0
54,0
27,9
25,0
22,8
23,6
19,0
18,5
14,7
10,9
13,2
0,0
2003
Best Bilingual
2004
Average Bilingual
2005
2006
Best Monolingual
2007
2008
Average Monolingual
Romanian
Portuguese
0
0,0
10
Dutch
Italian
11,55
30
22,5
28,64
31,2
25,5
25,5
50,5
45,5
49,5
64,5
65,96
63,5
64
68,95
54
56,5
42
53,16
44,5
42,5
70
28
27,5
28,19
24,5
32,5
80
French
20
23,5
25,5
22,63
14
19,0
60
Spanish
30
English
40
German
50
34,01
43,5
42,33
30
37,0
15
Best scores by language
Best
2004
Best
2005
Best
2006
Best
2007
Best
2008
16
Best scores by participant
17
Results depend on type of questions

Definitions


Factoids


Still very difficult
Linked questions


Same level of difficulty as factoids for some systems
Closed lists


50%-65% for several systems
Temporal restrictions


Almost solved for several systems 80%-95%
Still very difficult
Now wikipedia provides more answers
18
Conclusion

Same task as 2007

Same level of participation (slightly
better)
11
 43
 21
 51


target languages (9 with participation)
activated subtasks
participants
runs
Same results (slightly better)
19
Future direction

Less participants per language



Critics to QA over wikipedia




Poor comparison
Change methodology: one task for all
Easier to find questions with IR
No user model
Change collection
QA proposal for 2009

SC and breakout