When Google Isn’t Enough - Louisiana Library Association
Download
Report
Transcript When Google Isn’t Enough - Louisiana Library Association
Sometimes Google
Isn’t Enough
Finding Information on the
Invisible Web
Shirley McDonald
[email protected]
Hilda Donaldson
[email protected]
First: a definition of the
Visible (Surface) Web
“It’s made up of HTML
Web pages that the
search engines have
chosen to include in
their indices. It’s no
more complicated than
that.”
Sherman and Price.
Static Web pages
Fixed,
or static, pages do not
change and can be linked to
other pages.
Ex: http://www.truthorfiction.com
http://exploratorium.com
Dynamic Web Pages
Dynamic
- generated only by a
specific query; does not exist after
that query.
www.mapquest.com
http://www.aeroseek.com/webtrax/
The Invisible, Deep, or
Hidden Web
Web sites or
information that
Google or other
popular search
engines are not
capable of indexing
Websites specifically
excluded by the
search engine
Invisible (Deep or Hidden) Web
Public info is 400 – 550 times larger
550 billion individual documents vs one billion
on surface web
Quality content is 1,000 to 2,000 times
greater than surface web
95% of Deep Web is accessible to public (no
fees or subscription required)
(Bergman)
Hidden Web sites
Opaque Web – material that can
be, but is not included in search
engine results. Ex: new material
added and not yet picked up.
Private Web – sites intentionally excluded
from search engine results. Ex: password
protected
Proprietary Web – sites that require user
registration. Ex: eBay, New York Times
Pay per click – Ex: overture.com,
FindWhat.com
Content of Databases
Information stored in tables (Access, Oracle,
SQL Server, DB2) and accessible only by query.
Examples:
Phone books, People finders
Patents, laws
Items for sale in a Web store or Web-based
auctions
Digital exhibits
Multimedia and graphical files
Stock and bond prices
Examples of Hidden Sites
Pages in searchable databases: medical
(WebMD.com), patent, scientific, legal
(Lexis and Westlaw), reference
Pages requiring login or registration:
Blackboard, New York Times
Government publications or databases:
ERIC
Online databases: Gale Research
PDF files, audio, video, any new format
More Examples
Dictionaries and thesauri
Sites that require forms to be filled out
(ex: travel direction, job hunting)
Product catalogs and library catalogs
Newspaper and magazine archives
Dynamic web pages (ex: airline flight
checkers, mapquest)
Interactive tools (ex: calculators)
How are pages excluded from
search engines?
Google’s
PageRankTM puts
pages at the top of
the hit list by the
number of times
they are linked to
other pages
(popular)
Webmasters that
have figured out
how to manipulate
PageRank’sTM
behavior are able to
move their pages to
the top of the hit list
Faulty typing and/or judgment
Search engine spiders and crawlers cannot see
the site unless it contains a link to another site
Search engines can primarily see text pages
in HTML form
This will change in the future as search engines
become more capable of retrieving the “hidden”
web
Use of blocking techniques by the
webmaster or server
Password protection
HTML blocking in the web page
A listing on the server of blocked
pages
Searching the Invisible Web
Use the following to get around, just like
the visible web:
Directories – subject guide compiled by human
editors
Search Engines
Specialized Databases
Directories to search the
Invisible Web
Big Hub
http://www.thebighub.com/
Complete Planet: The Deep Web Directory
http://www.completeplanet.com
Digital Librarian: A Librarian’s Choice of the
Best of the Web
www.digital-librarian.com
70,000 searchable databases and specialty
search engines
More directories
IncyWincy: The Invisible Web Search Engine
Offers Web Search, Directory Search,
Metasearch, News
Note: Kids & Teens, Reference
http://www.incywincy.com
Invisible Web Directory
http://www.invisible-web.net/
Infomine: Scholarly Internet Resource
http://infomine.ucr.edu
Invisible Web Directory
http://www.invisible-web.net/
Librarian’s Index to the Internet
www.lii.org
Open Directory Project (dmoz)
http://www.dmoz.org (want to edit?)
ProFusion: The Original Meta-Search Engine
http://www.profusion.com/
Search Engines for
the Invisible Web
AlltheWeb: find it all
http://www.alltheweb.com
Bright Planet
Direct Search: SearchCenter (59 pages!)
http://www.brightplanet.com/
Can get updates through emails - Resourceshelf
http://www.freepint.com/gary/direct.htm
IxQuick: the world’s most powerful
metasearch engine
http://ixquick.com/
More Search Engines
Search-22
http://www.search-22.com
Search Adobe PDF Online
http://searchpdf.adobe.com/
Turbo10
http://turbo10.com
Vivisimo/Vivisimo Clustering
http://www.vivisimo.com
Specialized Databases
Library of Congress
http://catalog.loc.gov
LookSmart’s Find Articles (over 900
publications
http://www.findarticles.com
National Science Digital Library
http://www.nsdl.org
Singing Fish – audio and video
http://www.singingfish.com
Choosing the Best Search
NoodleTools
http://www.noodletools.com/debbie/literacies/infor
mation/5locate/adviceengine.html
Great chart that connects the information need to the
search strategy
How to Choose a Search Engine or Directory
http://library.albany.edu/internet/choose.html
Access to the Hidden Web is
Constantly Improving
“Google Scholar Offers Access to Academic
Information.” written by Danny Sullivan,
November 18, 2004
http://searchenginewatch.com/searchday/article.
php/3437471
Google makes arrangement with publishers to get
into password protected sites – sometimes shows
only abstract
Includes libraries of Oxford, Stanford, Michigan,
Harvard, NY Public
http://scholar.google.com/
Issues
“Let a Thousand Googles Bloom.” – by
Lawrence Lessig
http://www.latimes.com/news/opinion/commentary
Questions the legality and copyright issues
“Does Google move augur commercialization
of libraries?” – Detroit Free Press
http://www.freep.com/news/statewire/sw108716_2
0041214.htm
Alternative to Google Scholar
“Internet Archive to Build Alternative to
Google.” – by Mark Chillingworth
“Ten major international libraries have agreed to
combine their digitized book collections in a free
text-based archive hosted online by the not-forprofit Internet Archive.”
Open Access
Bibliography
Bergman, Michael K. “The Deep Web: Surfacing Hidden Value.”
http://www.beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp (8
November 2004).
Cadwallader, Joy. “Searching the Invisible Web.”
http://www.inf.aber.ac.uk/academicliaison/internet/invisible.asp (4 November 2004).
Chillingworth, Mark. “Internet archive to build alternative to Google.” Information
World. http://www.iwr.co.uk/IWR/1160176. (30 December 2004).
Cohen, Laura. “How to Choose a Search Engine or Directory.”
http://library.albany.edu/internet/choose.html (4 November 2004).
“Does Google move augur commericalization of libraries?”
http://www.freep.com/news/statewire/sw108716_20041214.htm (15 December 2004).
Grimes, Brad. “Expand your Web search horizons: six tips for finding the info you
want by searching hidden corners of the Web.” PC World. June, 2002.
“Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.”
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html (4
November 2004).
Lessig, Lawrence. “Let a Thousand Googles Bloom.”
http://www.latimes.com/news/opinion/commentary/la-oelesig12Jan12,1,1292618.story?ctrack=1 (13 January 2005).
McLaughlin, Laurianne. “Beyond Google: the web is so full of useful info that no
search engine can find it all. But a multitude of specialty sites deliver shopping advice,
reference databases, leisure-time ideas, and more – fast.” PC World. April, 2004.
Bibliography
Niederlander, Mary. “More on Searching: The Hidden Web or Invisible Web
Resources.” http://www.librarysupportstaff.com/hiddenweb.html (4 November
2004).
O’Leary, Mick. “Invisible Web Discovers Hidden Treasures.” Information Today.
January, 2000.
“Search Engines 101 – Search Engines Explained.”
http://www.submittoday.com/search_engines_101.htm (4 November 2004).
“Searching the Hidden Web.”
http://www2.canisius.edu/canhp/canlib/guides/hidden-web.html (4 November
2004).
Sherman, Chris and Gary Price. “The invisible web: uncovering sources search
engines can’t see.” Library Trends Fall, 2003.
Smith, C. Brian. “Invisible Web: Explore hidden troves of information.”
http://www.libraryspot.com/features/invisibleweb.htm (4 November 2004).
Sullivan, Danny. “Google Scholar Offers Access to Academic Information.”
http://searchenginewatch.com/searchday/article.php/3437471 (1 Dec. 2004).
Vine, Rita. “Going beyond Google for faster and smarter web searching.”
Teacher Librarian. October, 2004.