Transcript Slide 1

For the Google-Dependent:
The Other Search Engines
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
For
Rochester Regional Library Council
Member Libraries’ Staff
Sponsored by the
Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the
New York State Library 2008
For Today . . .




Landscape of Search in 2008
Update on Established Services
New Services
Creating Custom Search Engines
Why do I need more than
Google?

The Google effect -– the single most powerful force in today’s
Internet
– a private profit-driven company
– owns more information on individuals’
search behavior, companies and
organizations than any other entity
Why do I need more than
Google?



Great potential for misuse/abuse of this
information for financial gain
Societies seldom leave basic services
(utilities, medical and traffic regulation)
totally to the “free market”
Is web search now a “basic service” ???
Search dominance --



Potential skewing for commercial, political or
social purposes
– Database composition
– Ranking
– Privacy
No single search engine can crawl the whole
web
Limits search features, results display,
consumer and shopping information
http://google-watch.org
Web Search in 2008
Who’s crawling the Web?






Google
Yahoo
Live Search (MSN)
Ask owns Teoma
Gigablast
Exalead
US Engines by Search Share
September, 2007
60
50
40
3-D Colum
30
20
10
0
Google Yahoo!
Live
Ask
AOL
Size Estimates 7/9/08
Google AND Yahoo!
text filetypes in millions
8000
7000
6000
5000
Google
Yahoo!
4000
3000
2000
1000
0
html
htm
Size Estimates 7/9/08
Google AND Yahoo!
text filetypes in millions
700
600
500
400
Google
Yahoo!
300
200
100
0
pdf
doc
xls
ppt
xlm/rss
User Satisfaction
ForeSee Results and U. Michigan 8/14/07
82
80
78
76
'06
"07
74
72
70
68
66
Google
Yahoo!
Ask
Convincing others ...

twingine.com
– Searches Google and Yahoo, with results
in separate frames

jux2.com
– A meta for Google, Yahoo and Live, giving
rank of each result from each service
The Latest at Established Services
Yahoo Open Strategy






Y!OS – major internal and external redesign
to unify all Yahoo’s services
Owns Flickr, del.icio.us, Upcoming
“We are building social into everything we
do”
Offers more control over what is shared
Easier to set up small social networks
Will open some search technology to
developers and users
(http://developer.yahoo.com/search/boss/)
Yahoo and the Semantic Web

Will begin to include certain metadata
embedded in web pages as search and
ranking elements
– Dublin Core
– Creative Commons
– RDF
– GeoRSS

hCard
hCalendar
hReview
hAtom
Will support Open Search specification
allowing crawler access to deep web
resources (!!!)
OpenSearch: The Invisible
Web made Visible




Helps search engines and invisible web
databases communicate through a common
set of formats to perform search requests.
Created by Amazon.com and available
through Creative Commons
Potentially one of the most significant
developments for web search in the last ten
years
http://www.opensearch.org
Search.yahoo.com

Search subscription content
– Consumer Reports
– Forrester Research
– LexisNexis FT.com
Factiva
Wall St. Jn. (30 days)
TheStreet.com

Yahoo! Answers
answers.yahoo.com

– Online community connecting people with
questions to people wanting to answer them
– 90 million users sharing knowledge worldwide
– Feedback and answer reviews encouraged
Limit by Creative Commons (advanced search)
Yahoo’s Search Assist



Ajax-based service that “suggests”
terms and shortcuts as you type
Activate by clicking blue arrow below
the search box before searching
Also offers “Explore Concepts”,
searches “Shortcuts” highly associated
with the search terms
Yahoo Pipes - pipes.yahoo.com


Users can combine, filter and display
any RSS content
Finished “pipes” can be shared and
embedded in other web pages
– eg. A pipe for RSS feeds from educational
blogs flitering for technology, physics or
any other keywords

Version available for the iPhone
iphone.pipes.yahoo.com
Mashup


Web application combining data from
more than one source into a single
tool
Used to
– Navigate and visualize large and/or
dynamic datasets
– Combine data with dimensions of time,
distance and location

Juxtaposing data from different
sources can reveal new relationships
MSN’s Live.com
Database increasing
 Simpler Interface (4/08)
 “Rich Answers” blended results
 Image search enhancements
filter:face filter:portrait filter:bw
 NLP question processing improved
 Live Search Books and Search
Academic ended 5/08

New & Notable at Ask





The Butler is gone! Teoma is in his place!
Smart Search
Web Answers
Zoom
Superior Mapping Tools
Gigablast




Maintains unique database
Offers advanced search features
“Freshness dating limit” estimates the
date that a particular page was first
published or most recently edited or
modified
Custom Topic Search of Gigablast – up
to 500 domains
(www.gigablast.com/cts.html)
Exalead -
www.exalead.com





Launched October 2004, based in France
Maintains its own database
Smaller than most US services (8 billion)
Offers “Narrowing Options”
Advanced features:
– Phonetic spelling with “soundslike”
– Approximate spelling with “spellslike”

Limits: Site (URL), Filetype (8), Adult
content, Language (57!!!)
New Services
Wikiasari:




Quick rummaging search
“User ranked results”
Open source SE by Jimmy Wales and
Amazon
Initial results ordered with algorithms
a la Google
Users reorder results, which will be
used in ranking of future similar
searches
Wikiasari:
Quick rummaging search






Strength is in general search topics
Deep, complex or unusual searches
will not benefit as much
Intended to rival Google and Yahoo
Edits allowed on all search results
Recently launched
http://search.wikia.com
Kosmix
www.kosmix.com


Google interface
Offers overview of results by document type
Basic Facts
Reviews & Opinions
Media
People & Community
Shopping News



Extensive clustering by subject
Blended results with thumbnails of images,
video and audio clips, presentations and
reports
Human-created “topic pages” for subjects of
current interest
icerocket
www.icerocket.com

Searches
Blogs



Web
MySpace
News
Image
Link to cached version from Internet
Archive’s Wayback Machine
Limited advanced search features
MAY be a Google interface
ChaCha
www.chacha.com




Free mobile search service
Requires a (free) account
Text your questions and a human
“guide” sends back an answer, limited
to 160 characters
Supported by 98% of mobile providers
Clusty
www.clusty.com

Metaengine – Source engines for the Web
include
Live Ask Gigablast Wisenut Open Directory

Searches
Web News Images Wikipedia Blogs Jobs


Most extensive clustering capability of
any meta (Vivisimo)
Custom “Tabs” run saved searches on
engines of you select
Searchcrystal.com:
Visual Metasearch






Options include List View, Spiral View
and Cluster Display
Results common to more engines
appear in the center
Color=source engine
Shape=number of engines retrieving
the page
Size=rank position
Web, Blogs, Images, Tagging sites and
more…
Images – Cluster view
Images – Spiral view
Custom Search Engines
Google’s Custom Search
www.google.com/coop/cse/



A tool from the Google Coop initiative
Keywords chosen determine content and
weighting of results (limit of 100 characters)
Search
– Entire web
– Your selected sites only
– Entire web with selected sites emphasized


Within Coop, a CSE can be created and
maintained collaboratively
Stored or Linked versions available
Adding sites to a Stored
CSE




Manually
Using Google Marker – bookmarking
tool available for Firefox and IE7
RSS feeds may be included
Add
– Full domains www.moma.org
– Subdomains www.moma.org/*research*
– A single page
www.moma.org/modernteachers/
Linked CSE






Sites can be added “in bulk”
Select among your sites for individual
queries through specification files
Requires user to host and maintain
their own XML specification files
Migration from stored to linked
versions possible
More difficult to add single sites
Use G’s Search API’s to integrate other
Google services into a CSE
Other CSE’s

Gigablast – Custom Topic Search
www.gigablast.com/cts.html

Live Search Macros
search.live.com/macros

Rollyo – searches Yahoo
www.rollyo.com

Swicki
www.swicki.eurekster.com
Semantic Search Systems






Understand the user’s query
Understand Web text
Bring these together for query results
that are contextually relevant
Algorithms that match the meanings
and not just the words
Natural Language Processing
Concept Mapping
Semantic Search Systems




Expensive and time-consuming for
general web search; more possible in
subject-specific contexts
“What is palladium used for?”
Link-based crawler results:
London’s Palladium Theatre
Include the concept map “used for”
Sites about the element palladium
Hakia, Powerset, Cognition Search
Post Search
What do we do AFTER a search?


The search engine size wars are over
WANTED: Services that help manage, share
and update
– Web search results
– Tagged sites
WITH
scalability confidentiality “collaborability”
across all platforms, devices and file formats
Thank You!
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3552
[email protected]