Transcript Slide 1
For the Google-Dependent: The Other Search Engines Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2008 For Today . . . Landscape of Search in 2008 Update on Established Services New Services Creating Custom Search Engines Why do I need more than Google? The Google effect -– the single most powerful force in today’s Internet – a private profit-driven company – owns more information on individuals’ search behavior, companies and organizations than any other entity Why do I need more than Google? Great potential for misuse/abuse of this information for financial gain Societies seldom leave basic services (utilities, medical and traffic regulation) totally to the “free market” Is web search now a “basic service” ??? Search dominance -- Potential skewing for commercial, political or social purposes – Database composition – Ranking – Privacy No single search engine can crawl the whole web Limits search features, results display, consumer and shopping information http://google-watch.org Web Search in 2008 Who’s crawling the Web? Google Yahoo Live Search (MSN) Ask owns Teoma Gigablast Exalead US Engines by Search Share September, 2007 60 50 40 3-D Colum 30 20 10 0 Google Yahoo! Live Ask AOL Size Estimates 7/9/08 Google AND Yahoo! text filetypes in millions 8000 7000 6000 5000 Google Yahoo! 4000 3000 2000 1000 0 html htm Size Estimates 7/9/08 Google AND Yahoo! text filetypes in millions 700 600 500 400 Google Yahoo! 300 200 100 0 pdf doc xls ppt xlm/rss User Satisfaction ForeSee Results and U. Michigan 8/14/07 82 80 78 76 '06 "07 74 72 70 68 66 Google Yahoo! Ask Convincing others ... twingine.com – Searches Google and Yahoo, with results in separate frames jux2.com – A meta for Google, Yahoo and Live, giving rank of each result from each service The Latest at Established Services Yahoo Open Strategy Y!OS – major internal and external redesign to unify all Yahoo’s services Owns Flickr, del.icio.us, Upcoming “We are building social into everything we do” Offers more control over what is shared Easier to set up small social networks Will open some search technology to developers and users (http://developer.yahoo.com/search/boss/) Yahoo and the Semantic Web Will begin to include certain metadata embedded in web pages as search and ranking elements – Dublin Core – Creative Commons – RDF – GeoRSS hCard hCalendar hReview hAtom Will support Open Search specification allowing crawler access to deep web resources (!!!) OpenSearch: The Invisible Web made Visible Helps search engines and invisible web databases communicate through a common set of formats to perform search requests. Created by Amazon.com and available through Creative Commons Potentially one of the most significant developments for web search in the last ten years http://www.opensearch.org Search.yahoo.com Search subscription content – Consumer Reports – Forrester Research – LexisNexis FT.com Factiva Wall St. Jn. (30 days) TheStreet.com Yahoo! Answers answers.yahoo.com – Online community connecting people with questions to people wanting to answer them – 90 million users sharing knowledge worldwide – Feedback and answer reviews encouraged Limit by Creative Commons (advanced search) Yahoo’s Search Assist Ajax-based service that “suggests” terms and shortcuts as you type Activate by clicking blue arrow below the search box before searching Also offers “Explore Concepts”, searches “Shortcuts” highly associated with the search terms Yahoo Pipes - pipes.yahoo.com Users can combine, filter and display any RSS content Finished “pipes” can be shared and embedded in other web pages – eg. A pipe for RSS feeds from educational blogs flitering for technology, physics or any other keywords Version available for the iPhone iphone.pipes.yahoo.com Mashup Web application combining data from more than one source into a single tool Used to – Navigate and visualize large and/or dynamic datasets – Combine data with dimensions of time, distance and location Juxtaposing data from different sources can reveal new relationships MSN’s Live.com Database increasing Simpler Interface (4/08) “Rich Answers” blended results Image search enhancements filter:face filter:portrait filter:bw NLP question processing improved Live Search Books and Search Academic ended 5/08 New & Notable at Ask The Butler is gone! Teoma is in his place! Smart Search Web Answers Zoom Superior Mapping Tools Gigablast Maintains unique database Offers advanced search features “Freshness dating limit” estimates the date that a particular page was first published or most recently edited or modified Custom Topic Search of Gigablast – up to 500 domains (www.gigablast.com/cts.html) Exalead - www.exalead.com Launched October 2004, based in France Maintains its own database Smaller than most US services (8 billion) Offers “Narrowing Options” Advanced features: – Phonetic spelling with “soundslike” – Approximate spelling with “spellslike” Limits: Site (URL), Filetype (8), Adult content, Language (57!!!) New Services Wikiasari: Quick rummaging search “User ranked results” Open source SE by Jimmy Wales and Amazon Initial results ordered with algorithms a la Google Users reorder results, which will be used in ranking of future similar searches Wikiasari: Quick rummaging search Strength is in general search topics Deep, complex or unusual searches will not benefit as much Intended to rival Google and Yahoo Edits allowed on all search results Recently launched http://search.wikia.com Kosmix www.kosmix.com Google interface Offers overview of results by document type Basic Facts Reviews & Opinions Media People & Community Shopping News Extensive clustering by subject Blended results with thumbnails of images, video and audio clips, presentations and reports Human-created “topic pages” for subjects of current interest icerocket www.icerocket.com Searches Blogs Web MySpace News Image Link to cached version from Internet Archive’s Wayback Machine Limited advanced search features MAY be a Google interface ChaCha www.chacha.com Free mobile search service Requires a (free) account Text your questions and a human “guide” sends back an answer, limited to 160 characters Supported by 98% of mobile providers Clusty www.clusty.com Metaengine – Source engines for the Web include Live Ask Gigablast Wisenut Open Directory Searches Web News Images Wikipedia Blogs Jobs Most extensive clustering capability of any meta (Vivisimo) Custom “Tabs” run saved searches on engines of you select Searchcrystal.com: Visual Metasearch Options include List View, Spiral View and Cluster Display Results common to more engines appear in the center Color=source engine Shape=number of engines retrieving the page Size=rank position Web, Blogs, Images, Tagging sites and more… Images – Cluster view Images – Spiral view Custom Search Engines Google’s Custom Search www.google.com/coop/cse/ A tool from the Google Coop initiative Keywords chosen determine content and weighting of results (limit of 100 characters) Search – Entire web – Your selected sites only – Entire web with selected sites emphasized Within Coop, a CSE can be created and maintained collaboratively Stored or Linked versions available Adding sites to a Stored CSE Manually Using Google Marker – bookmarking tool available for Firefox and IE7 RSS feeds may be included Add – Full domains www.moma.org – Subdomains www.moma.org/*research* – A single page www.moma.org/modernteachers/ Linked CSE Sites can be added “in bulk” Select among your sites for individual queries through specification files Requires user to host and maintain their own XML specification files Migration from stored to linked versions possible More difficult to add single sites Use G’s Search API’s to integrate other Google services into a CSE Other CSE’s Gigablast – Custom Topic Search www.gigablast.com/cts.html Live Search Macros search.live.com/macros Rollyo – searches Yahoo www.rollyo.com Swicki www.swicki.eurekster.com Semantic Search Systems Understand the user’s query Understand Web text Bring these together for query results that are contextually relevant Algorithms that match the meanings and not just the words Natural Language Processing Concept Mapping Semantic Search Systems Expensive and time-consuming for general web search; more possible in subject-specific contexts “What is palladium used for?” Link-based crawler results: London’s Palladium Theatre Include the concept map “used for” Sites about the element palladium Hakia, Powerset, Cognition Search Post Search What do we do AFTER a search? The search engine size wars are over WANTED: Services that help manage, share and update – Web search results – Tagged sites WITH scalability confidentiality “collaborability” across all platforms, devices and file formats Thank You! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY 14456 (315) 781-3552 [email protected]