Transcript PowerPoint

CS 430: Information Discovery
Lecture 10
Cranfield and TREC
1
Course administration
• Assignment 1 grades will be provided after the midsemester break
• Assignment 2 has been posted
• Wednesday evening -- 2nd class on Perl
2
Retrieval Effectiveness
Designing an information retrieval system has many
decisions:
Manual or automatic indexing?
Natural language or controlled vocabulary?
What stoplists?
What stemming methods?
What query syntax?
etc.
How do we know which of these methods are most effective?
Is everything a matter of judgment?
3
Studies of Retrieval Effectiveness
• The Cranfield Experiments, Cyril W. Cleverdon, Cranfield
College of Aeronautics, 1957 -1968
• SMART System, Gerald Salton, Cornell University, 1964-1988
• TREC, Donna Harman, National Institute of Standards and
Technology (NIST), 1992 -
4
Cranfield Experiments (Example)
Comparative efficiency of indexing systems:
(Universal Decimal Classification, alphabetical subject index, a
special facet classification, Uniterm system of co-ordinate
indexing)
Four indexes prepared manually for each document in three batches
of 6,000 documents -- total 18,000 documents, each indexed four
times. The documents were reports and paper in aeronautics.
Indexes for testing were prepared on index cards and other cards.
Very careful control of indexing procedures.
5
Cranfield Experiments (continued)
Searching:
6
•
1,200 test questions, each satisfied by at least one document
•
Reviewed by expert panel
•
Searches carried out by 3 expert librarians
•
Two rounds of searching to develop testing methodology
•
Subsidiary experiments at English Electric Whetstone
Laboratory and Western Reserve University
The Cranfield Data
The Cranfield data was made widely available and used by other
researchers
• Salton used the Cranfield data with the SMART system (a) to
study the relationship between recall and precision, and (b) to
compare automatic indexing with human indexing
• Sparc Jones and van Rijsbergen used the Cranfield data for
experiments in relevance weighting, clustering, definition of test
corpora, etc.
7
Some Cranfield Results
• The various manual indexing systems have similar retrieval
efficiency
• Retrieval effectiveness using automatic indexing can be at
least as effective as manual indexing with controlled
vocabularies
-> original results from the Cranfield experiments
-> considered counter-intuitive
-> other results since then have supported this conclusion
8
Cranfield Experiments -- Analysis
Cleverdon introduced recall and precision, based on concept of
relevance.
recall (%)
practical systems
precision (%)
9
The Cranfield methodology
• Recall and precision: depend on concept of relevance
-> Is relevance a context-, task-independent property of documents?
"Relevance is the correspondence in context between an
information requirement statement (a query) and an article
(a document), that is, the extent to which the article covers
the material that is appropriate to the requirement
statement."
F. W. Lancaster, 1979
10
Relevance
• Recall and precision values are for a specific set of
documents and a specific set of queries
• Relevance is subjective, but experimental evidence
suggests that for textual documents different experts have
similar judgments about relevance
•
Estimates of relevance level are less consistent
•
Query types are important, depending on specificity
-> subject-heading queries
-> title queries
-> paragraphs
Tests should use realistic queries
11
Text Retrieval Conferences (TREC)
•
Led by Donna Harman (NIST), with DARPA support
•
Annual since 1992
•
Corpus of several million textual documents, total of more than
five gigabytes of data
•
Researchers attempt a standard set of tasks
-> search the corpus for topics provided by surrogate users
-> match a stream of incoming documents against standard
queries
•
12
Participants include large commercial companies, small
information retrieval vendors, and university research groups.
The TREC Corpus
Source
13
Size
(Mbytes)
# Docs
Median
words/doc
Wall Street Journal, 87-89
Associated Press newswire, 89
Computer Selects articles
Federal Register, 89
abstracts of DOE publications
267
254
242
260
184
98,732
84,678
75,180
25,960
226,087
245
446
200
391
111
Wall Street Journal, 90-92
Associated Press newswire, 88
Computer Selects articles
Federal Register, 88
242
237
175
209
74,520
79,919
56,920
19,860
301
438
182
396
The TREC Corpus (continued)
Source
14
Size
(Mbytes)
# Docs
Median
words/doc
San Jose Mercury News 91
Associated Press newswire, 90
Computer Selects articles
U.S. patents, 93
287
237
345
243
90,257
78,321
161,021
6,711
379
451
122
4,445
Financial Times, 91-94
Federal Register, 94
Congressional Record, 93
564
395
235
210,158
55,630
27,922
316
588
288
Foreign Broadcast Information
LA Times
470
475
130,471
131,896
322
351
The TREC Corpus (continued)
Notes:
1. The TREC corpus consists mainly of general articles. The
Cranfield data was in a specialized engineering domain.
2. The TREC data is raw data:
-> No stop words are removed; no stemming
-> Words are alphanumeric strings
-> No attempt made to correct spelling, sentence fragments, etc.
15
TREC Topic Statement
<num> Number: 409
<title> legal, Pan Am, 103
<desc> Description:
What legal actions have resulted from the destruction of Pan Am
Flight 103 over Lockerbie, Scotland, on December 21, 1988?
<narr> Narrative:
Documents describing any charges, claims, or fines presented to
or imposed by any court or tribunal are relevant, but documents
that discuss charges made in diplomatic jousting are not relevant.
A sample TREC topic statement
16
TREC Experiments
NIST provides text corpus on CD-ROM
Participant builds index using own technology
2.
NIST provides 50 natural language topic statements
Participant converts to queries (automatically or manually)
3.
Participant run search, returns up to 1,000 hits to NIST.
NIST analyzes for recall and precision (all TREC participants
use rank based methods of searching)
17
1.
Relevance Assessment
For each query, a pool of potentially relevant documents is
assembled, using the top 100 ranked documents from each
participant
The human expert who set the query looks at every document
in the pool and determines whether it is relevant.
Documents outside the pool are not examined.
In a TREC-8 example, with 71 participants:
7,100 documents in the pool
1,736 unique documents (eliminating duplicates)
94 judged relevant
18
A Cornell Footnote
The TREC analysis uses a program developed by Chris Buckley,
who spent 17 years at Cornell before completing his Ph.D. in
1995.
Buckley has continued to maintain the SMART software and has
been a participant at every TREC conference. SMART is used as
the basis against which other systems are compared.
During the early TREC conferences, the tuning of SMART with
the TREC corpus led to steady improvements in retrieval
efficiency, but after about TREC-5 a plateau was reached.
TREC-8, in 1999, was the final year for this experiment.
19