Transcript Products
Inside Internet Search Engines: Products William Chang and Jan Pedersen Sigir’99 Web Oracle One, Two, Three... Network of computers? Network of hypertext? Network of people? Internet...is a place where you can always find someone to help answer any question, or get anything done. Productize that! 2 Sigir’99 Who’s Who and What’s What? Query logs what do people look for, besides sex? What are indexible terms unbounded? Can you index all possible phrases? Formatting cue helps Syntax helps Stemming helps Precision vs recall WordNet -> PhraseNet? 3 Sigir’99 Who Likes What? Too many hits! the problem of indistinguishable scores Spamming the relevant and irrelevant The web to the rescue inside-out indexing 4 Sigir’99 Citation Index or Popularity Contest? Counting hyperlinks Avoiding double-counting Site clustering; what’s a site? Judging the source Hyperlinks revisited Anchor text context; Yanhong Li Why is this result hard to duplicate? Does adding more context help? 5 Sigir’99 Who asks What? Query logs revisited Query-based indexing – why index things people don’t ask for? If they ask for A, give them B From atomic concepts to query extensions Structure of questions and answers Shyam Kapur’s chunks 6 Sigir’99 FAQs and not so FAQs Usenet FAQs –Robin Burke’s FAQFinder FAQ discovery Where are the answers? 7 Sigir’99 Indexing Different ways of crawling the web Frequency of change Frequency of request Managing Terabytes or GigaURLs? Real-time indexing 8 Sigir’99 Searching Multiway merge and scoring Logical operations Query parsing and phrase searching Query refinement Distributed searching and the perfect merge 9 Sigir’99 Design Issues Managing complexity Managing memory Managing parallelism Managing data turnover Managing scalability 10 Sigir’99 Futures Vertical markets – healthcare, real estate, jobs and resumes, etc. Localized search Search as embedded app Shopping 'bots Open Problems Has the bubble burst? 11 Sigir’99