Transcript PowerPoint

Discussion Class 5
TREC
1
Discussion Classes
Format:
Questions.
Ask a member of the class to answer.
Provide opportunity for others to comment.
When answering:
Stand up.
Give your name. Make sure that the TA hears it.
Speak clearly so that all the class can hear.
Suggestions:
Do not be shy at presenting partial answers.
Differing viewpoints are welcome.
2
Question 1: Objectives
The TREC workshop series has four goals:
(a) Encourage research in text based retrieval based on large test
collections
(b) Communication among industry, academia and government
(c) Transfer of technology from research labs into products by
demonstrating methodologies on real-world problems
(d) Increase availability of appropriate evaluation techniques
What does the ad hoc task contribute to each of these goals?
3
Question 2: The TREC Corpus
Source
4
Size
(Mbytes)
# Docs
Median
words/doc
Wall Street Journal, 87-89
Associated Press newswire, 89
Computer Selects articles
Federal Register, 89
abstracts of DOE publications
267
254
242
260
184
98,732
84,678
75,180
25,960
226,087
245
446
200
391
111
Wall Street Journal, 90-92
Associated Press newswire, 88
Computer Selects articles
Federal Register, 88
242
237
175
209
74,520
79,919
56,920
19,860
301
438
182
396
Question 2: The TREC Corpus
(a) What characteristics of this data are likely to impact the
results of experiments?
(b) Explain the statement, "Disks 1-5 were used as training
data."
(c) Suppose that you were designing two search engines: (i)
for use with a library catalog, (ii) for use with a Web
search service. How does your data differ from the
TREC corpus?
5
Question 3: TREC Topic Statement
<num> Number: 409
<title> legal, Pan Am, 103
<desc> Description:
What legal actions have resulted from the destruction of Pan Am
Flight 103 over Lockerbie, Scotland, on December 21, 1988?
<narr> Narrative:
Documents describing any charges, claims, or fines presented to
or imposed by any court or tribunal are relevant, but documents
that discuss charges made in diplomatic jousting are not relevant.
A sample TREC topic statement
6
Question 3: TREC Topic Statement
(a) What is the relationship between TREC topic statements and
queries?
(b) Distinguish between manual and automatic methods of query
generation.
(c) Explain the process used by the manual methods.
(d) Some of the results used a time limit (e.g., "limited to no
more than 10 minutes clock time"). What was being timed?
7
Question 4: Relevance Assessments
(a) Explain the statement, "All TRECs have used the pooling
method to assemble the relevance assessments."
(b) How is relevance assessed?
(c) What is the impact of some relevant documents being missed
from the pool?
(d) What is the problem of some relevant documents in the pool
coming from only a single run? How serious is this?
8
Question 5: Evaluation
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
9
Question 5:
What are:
(a) The recall-precision curve?
(b) The mean (non-interpolated) average precision?
The report commented that, "two topics are fundamental to
effective retrieval performance." What are they?
How do the automatic tests differ from the manual?
10
Question 6: The future
(a) Why was TREC-8 the last year for the ad hoc task?
(b) Does this mean that text-based information retrieval is
now solved?
11