Transcript PowerPoint

Discussion Class 2
A Vector Space Model
for Automated Indexing
1
Discussion Classes
Format:
Questions.
Ask a member of the class to answer.
Provide opportunity for others to comment.
When answering:
Stand up.
Give your name. Make sure that the TA hears it.
Speak clearly so that all the class can hear.
Suggestions:
Do not be shy at presenting partial answers.
Differing viewpoints are welcome.
2
Question 1: Reading a Research
Paper
(a) Who are the authors of this paper? What is their
background? Why did they write this paper?
(b) When was the paper written? Since then, what
has changed about computing?
(c) Since the paper was published was has changed
about information retrieval?
3
Question 2. Reading a Research
Paper
4
Question 3: Research Methodology
Define precision and recall.
5
Question 4. Summary of the paper
6
(a)
What is the overall hypothesis that is examined in this
paper?
(b)
How does Section 2, Correlation between Indexing
Performance and Space Density, relate to the
hypothesis?
(c)
How does Section 3, Correlation between Space
Density and Indexing Performance, relate to the
hypothesis?
(d)
How does Section 4, The Discrimination Value Model,
relate to the hypothesis?
Question 5: Document Space
Explain this diagram
7
Question 6: Weighting -- Term
Frequency
The paper examines the effect of term weighting on
the space density of index terms.
8
(a)
Why is this of interest in information retrieval?
(b)
What form of term frequency (tf) is used in this
paper?
(c)
How does this form of term frequency differ
from the standard form discussed in class?
Under what circumstances is this difference
significant?
Question 7: Discrimination Value
Model
Explain the following expression, which the authors use to
measure the contribution of term k to the space density.
DVk = Qk - Q
What does this tell about the discriminant value of term k?
9
Question 7:
Question 8:
Discuss this graph
™ and a TIFF (LZW ) decompressor are needed to see this picture.
10