Transcript Products

Inside Internet Search Engines:
Products
William Chang
and
Jan Pedersen
Sigir’99
Web Oracle One, Two, Three...
Network of computers?
Network of hypertext?
Network of people?
Internet...is a place where you can always
find someone to help answer any question,
or get anything done.
Productize that!
2
Sigir’99
Who’s Who and What’s What?
Query logs
what do people look for, besides sex?
What are indexible terms unbounded?
Can you index all possible phrases?
Formatting cue helps
Syntax helps
Stemming helps
Precision vs recall
WordNet -> PhraseNet?
3
Sigir’99
Who Likes What?
Too many hits!
the problem of indistinguishable scores
Spamming
the relevant and irrelevant
The web to the rescue
inside-out indexing
4
Sigir’99
Citation Index or Popularity Contest?
Counting hyperlinks
Avoiding double-counting
Site clustering; what’s a site?
Judging the source
Hyperlinks revisited
Anchor text context; Yanhong Li
Why is this result hard to duplicate?
Does adding more context help?
5
Sigir’99
Who asks What?
Query logs revisited
Query-based indexing – why index things
people don’t ask for?
If they ask for A, give them B
From atomic concepts to query extensions
Structure of questions and answers
Shyam Kapur’s chunks
6
Sigir’99
FAQs and not so FAQs
Usenet FAQs –Robin Burke’s FAQFinder
FAQ discovery
Where are the answers?
7
Sigir’99
Indexing
Different ways of crawling the web
Frequency of change
Frequency of request
Managing Terabytes or GigaURLs?
Real-time indexing
8
Sigir’99
Searching
Multiway merge and scoring
Logical operations
Query parsing and phrase searching
Query refinement
Distributed searching and the perfect merge
9
Sigir’99
Design Issues
Managing complexity
Managing memory
Managing parallelism
Managing data turnover
Managing scalability
10 Sigir’99
Futures
Vertical markets – healthcare, real estate,
jobs and resumes, etc.
Localized search
Search as embedded app
Shopping 'bots
Open Problems
Has the bubble burst?
11 Sigir’99