Transcript PowerPoint

Discussion Class 2
A Vector Space Model
for Automated Indexing
1
Discussion Classes
Format:
Questions.
Ask a member of the class to answer.
Provide opportunity for others to comment.
When answering:
Stand up.
Give your name. Make sure that the TA hears it.
Speak clearly so that all the class can hear.
Suggestions:
Do not be shy at presenting partial answers.
Differing viewpoints are welcome.
2
Question 1: Reading a Research
Paper
(a) Who are the authors of this paper? What is their
background? Why did they write this paper?
(b) When was the paper written? What has changed
since then?
(c) What journal was the paper published in? Who
are the readers of this journal?
3
Question 2. Summary of the paper
4
(a)
What is the overall hypothesis that is examined in this
paper?
(b)
How does Section 2, Correlation between Indexing
Performance and Space Density, relate to the
hypothesis?
(c)
How does Section 3, Correlation between Space
Density and Indexing Performance, relate to the
hypothesis?
(d)
How does Section 4, The Discrimination Value Model,
relate to the hypothesis?
Question 3: Document Space
How does this
diagram relate to the
hypothesis?
QuickTi me™ a nd a TIFF (LZW) de com press or are nee ded to s ee this picture.
5
Question 4: Research Methodology
(a) Define precision and recall.
(b) What is a "recall-precision graph"?
6
Question 5: Weighting -- Term
Frequency
The paper examines the effect of term weighting on
the space density of index terms.
7
(a)
Why is this of interest in information retrieval?
(b)
What form of term frequency (tf) is used in this
paper?
(c)
How does this form of term frequency differ
from the standard form discussed in class?
Under what circumstances is this difference
significant?
Question 6: Discrimination Value
Model
Explain the following expression, which the authors use to
measure the contribution of term k to the space density.
DVk = Qk - Q
What does this tell about the discriminant value of term k?
8
Question 7:
Question 7:
Discuss this graph
™ and a TIFF (LZW ) decompressor are needed to see this picture.
9