Transcript PowerPoint

Discussion Class 7
Google
1
Discussion Classes
Format:
Question
Ask a member of the class to answer
Provide opportunity for others to comment
When answering:
Give your name. Make sure that the TA hears it.
Stand up
Speak clearly so that all the class can hear
2
Question 1: The Underlying Problem
Google and Inspec are systems that index different varieties
of full text for different groups of users.
(a) What are the differences between the items being
indexed?
(b) What are the differences between the groups of users?
(c) What do these difference imply for the designers of these
systems?
3
Question 2: Precision
The authors of the paper state that their objective is to
maximize precision.
(a) What do they mean by "precision"?
(b) How does their use of this term differ from the
traditional use?
(c) What is their strategy for maximizing precision?
4
Question 3: Ranking
Google uses at least three different ranking methods.
(a) What are they?
(b) What do you consider the impact of each?
(c) The authors criticize conventional ranking methods.
What are their criticisms? Do you agree with them?
5
Question 4: Scaling
Much of the article is about scalability.
(a) How many pages were they indexing when they wrote
the article? How many today? How many queries does the
system handle every day?
(b) What is their strategy for scalability? Where do you
think the limitations lie?
(c) How do they manage to implement such a large-scale
(and ever changing) with a small technical staff?
6
Question 5: Spamming
"There are even numerous companies which specialize in
manipulating search engines for profit."
(a) Explain this statement.
(b) How does Google overcome this problem?
(c) Why are the authors unenthusiastic about using metadata
for indexing the web?
7
Question 6: Implementation
(a) What is the function of the Google lexicon? How is it
stored?
(b) What is the function of the hit list? How is it stored?
(c) What is the function of the forward index? How is it
stored?
8