When Google Isn’t Enough - Louisiana Library Association

Download Report

Transcript When Google Isn’t Enough - Louisiana Library Association

Sometimes Google
Isn’t Enough
Finding Information on the
Invisible Web
Shirley McDonald
[email protected]
Hilda Donaldson
[email protected]
First: a definition of the
Visible (Surface) Web

“It’s made up of HTML
Web pages that the
search engines have
chosen to include in
their indices. It’s no
more complicated than
that.”

Sherman and Price.
Static Web pages
 Fixed,
or static, pages do not
change and can be linked to
other pages.
Ex: http://www.truthorfiction.com
http://exploratorium.com
Dynamic Web Pages
 Dynamic
- generated only by a
specific query; does not exist after
that query.

www.mapquest.com

http://www.aeroseek.com/webtrax/
The Invisible, Deep, or
Hidden Web


Web sites or
information that
Google or other
popular search
engines are not
capable of indexing
Websites specifically
excluded by the
search engine
Invisible (Deep or Hidden) Web




Public info is 400 – 550 times larger
550 billion individual documents vs one billion
on surface web
Quality content is 1,000 to 2,000 times
greater than surface web
95% of Deep Web is accessible to public (no
fees or subscription required)
(Bergman)
Hidden Web sites
Opaque Web – material that can
be, but is not included in search
engine results. Ex: new material
added and not yet picked up.
Private Web – sites intentionally excluded
from search engine results. Ex: password
protected
Proprietary Web – sites that require user
registration. Ex: eBay, New York Times
Pay per click – Ex: overture.com,
FindWhat.com
Content of Databases


Information stored in tables (Access, Oracle,
SQL Server, DB2) and accessible only by query.
Examples:






Phone books, People finders
Patents, laws
Items for sale in a Web store or Web-based
auctions
Digital exhibits
Multimedia and graphical files
Stock and bond prices
Examples of Hidden Sites
Pages in searchable databases: medical
(WebMD.com), patent, scientific, legal
(Lexis and Westlaw), reference
 Pages requiring login or registration:
Blackboard, New York Times
 Government publications or databases:
ERIC
 Online databases: Gale Research
 PDF files, audio, video, any new format

More Examples
Dictionaries and thesauri
 Sites that require forms to be filled out
(ex: travel direction, job hunting)
 Product catalogs and library catalogs
 Newspaper and magazine archives
 Dynamic web pages (ex: airline flight
checkers, mapquest)
 Interactive tools (ex: calculators)

How are pages excluded from
search engines?

Google’s
PageRankTM puts
pages at the top of
the hit list by the
number of times
they are linked to
other pages
(popular)

Webmasters that
have figured out
how to manipulate
PageRank’sTM
behavior are able to
move their pages to
the top of the hit list

Faulty typing and/or judgment


Search engine spiders and crawlers cannot see
the site unless it contains a link to another site
Search engines can primarily see text pages
in HTML form

This will change in the future as search engines
become more capable of retrieving the “hidden”
web

Use of blocking techniques by the
webmaster or server
Password protection
 HTML blocking in the web page
 A listing on the server of blocked
pages

Searching the Invisible Web

Use the following to get around, just like
the visible web:



Directories – subject guide compiled by human
editors
Search Engines
Specialized Databases
Directories to search the
Invisible Web


Big Hub
 http://www.thebighub.com/
Complete Planet: The Deep Web Directory

http://www.completeplanet.com
Digital Librarian: A Librarian’s Choice of the
Best of the Web
 www.digital-librarian.com


70,000 searchable databases and specialty
search engines
More directories

IncyWincy: The Invisible Web Search Engine

Offers Web Search, Directory Search,
Metasearch, News
Note: Kids & Teens, Reference

http://www.incywincy.com

Invisible Web Directory

http://www.invisible-web.net/





Infomine: Scholarly Internet Resource
 http://infomine.ucr.edu
Invisible Web Directory
 http://www.invisible-web.net/
Librarian’s Index to the Internet
 www.lii.org
Open Directory Project (dmoz)
 http://www.dmoz.org (want to edit?)
ProFusion: The Original Meta-Search Engine
 http://www.profusion.com/
Search Engines for
the Invisible Web


AlltheWeb: find it all
 http://www.alltheweb.com
Bright Planet


Direct Search: SearchCenter (59 pages!)



http://www.brightplanet.com/
Can get updates through emails - Resourceshelf
http://www.freepint.com/gary/direct.htm
IxQuick: the world’s most powerful
metasearch engine
 http://ixquick.com/
More Search Engines

Search-22




http://www.search-22.com
Search Adobe PDF Online
 http://searchpdf.adobe.com/
Turbo10
 http://turbo10.com
Vivisimo/Vivisimo Clustering

http://www.vivisimo.com
Specialized Databases




Library of Congress
 http://catalog.loc.gov
LookSmart’s Find Articles (over 900
publications
 http://www.findarticles.com
National Science Digital Library
 http://www.nsdl.org
Singing Fish – audio and video

http://www.singingfish.com
Choosing the Best Search

NoodleTools

http://www.noodletools.com/debbie/literacies/infor
mation/5locate/adviceengine.html


Great chart that connects the information need to the
search strategy
How to Choose a Search Engine or Directory

http://library.albany.edu/internet/choose.html
Access to the Hidden Web is
Constantly Improving
“Google Scholar Offers Access to Academic
Information.” written by Danny Sullivan,
November 18, 2004

http://searchenginewatch.com/searchday/article.
php/3437471



Google makes arrangement with publishers to get
into password protected sites – sometimes shows
only abstract
Includes libraries of Oxford, Stanford, Michigan,
Harvard, NY Public
http://scholar.google.com/
Issues

“Let a Thousand Googles Bloom.” – by
Lawrence Lessig

http://www.latimes.com/news/opinion/commentary


Questions the legality and copyright issues
“Does Google move augur commercialization
of libraries?” – Detroit Free Press

http://www.freep.com/news/statewire/sw108716_2
0041214.htm
Alternative to Google Scholar

“Internet Archive to Build Alternative to
Google.” – by Mark Chillingworth


“Ten major international libraries have agreed to
combine their digitized book collections in a free
text-based archive hosted online by the not-forprofit Internet Archive.”
Open Access
Bibliography









Bergman, Michael K. “The Deep Web: Surfacing Hidden Value.”
http://www.beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp (8
November 2004).
Cadwallader, Joy. “Searching the Invisible Web.”
http://www.inf.aber.ac.uk/academicliaison/internet/invisible.asp (4 November 2004).
Chillingworth, Mark. “Internet archive to build alternative to Google.” Information
World. http://www.iwr.co.uk/IWR/1160176. (30 December 2004).
Cohen, Laura. “How to Choose a Search Engine or Directory.”
http://library.albany.edu/internet/choose.html (4 November 2004).
“Does Google move augur commericalization of libraries?”
http://www.freep.com/news/statewire/sw108716_20041214.htm (15 December 2004).
Grimes, Brad. “Expand your Web search horizons: six tips for finding the info you
want by searching hidden corners of the Web.” PC World. June, 2002.
“Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.”
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html (4
November 2004).
Lessig, Lawrence. “Let a Thousand Googles Bloom.”
http://www.latimes.com/news/opinion/commentary/la-oelesig12Jan12,1,1292618.story?ctrack=1 (13 January 2005).
McLaughlin, Laurianne. “Beyond Google: the web is so full of useful info that no
search engine can find it all. But a multitude of specialty sites deliver shopping advice,
reference databases, leisure-time ideas, and more – fast.” PC World. April, 2004.
Bibliography








Niederlander, Mary. “More on Searching: The Hidden Web or Invisible Web
Resources.” http://www.librarysupportstaff.com/hiddenweb.html (4 November
2004).
O’Leary, Mick. “Invisible Web Discovers Hidden Treasures.” Information Today.
January, 2000.
“Search Engines 101 – Search Engines Explained.”
http://www.submittoday.com/search_engines_101.htm (4 November 2004).
“Searching the Hidden Web.”
http://www2.canisius.edu/canhp/canlib/guides/hidden-web.html (4 November
2004).
Sherman, Chris and Gary Price. “The invisible web: uncovering sources search
engines can’t see.” Library Trends Fall, 2003.
Smith, C. Brian. “Invisible Web: Explore hidden troves of information.”
http://www.libraryspot.com/features/invisibleweb.htm (4 November 2004).
Sullivan, Danny. “Google Scholar Offers Access to Academic Information.”
http://searchenginewatch.com/searchday/article.php/3437471 (1 Dec. 2004).
Vine, Rita. “Going beyond Google for faster and smarter web searching.”
Teacher Librarian. October, 2004.