Googling Welcome !  While you are waiting, please…  find in your packet: Exercise 6 - Questions for the Final Exercise “What Do You Want Google.

Download Report

Transcript Googling Welcome !  While you are waiting, please…  find in your packet: Exercise 6 - Questions for the Final Exercise “What Do You Want Google.

Googling
Welcome !

While you are waiting, please…

find in your packet:
Exercise 6 - Questions for the Final Exercise
“What Do You Want Google to Tell You?”

begin writing down your questions in
three or more categories
Googling
Instructor: Joe Barker
[email protected]
An Infopeople Workshop
2005
Googling
This Workshop is Brought to You By the
Infopeople Project
Infopeople is a federally-funded grant project supported
by the California State Library. It provides a wide
variety of training to California libraries. Infopeople
workshops are offered around the state and are open
registration on a first-come, first-served basis.
For a complete list of workshops, and for other
information about the Project, go to the Infopeople Web
site at infopeople.org.
Introductions

Name

Library

Position

How do you use Google?
Workshop Overview

Google’s way of “thinking”

Taking charge of the driving

Using limits to find the hard-to-get

Finding information on a subject

Special Google databases and tools

What to do when Google doesn’t work
Go to:
bookmarks.infopeople.org

Click on extreme_googling_bk.htm

Make a bookmark of this page

Add to Favorites
Exercise 1
How does Google “think” about your
searches?
Please pause and wait for discussion when
you reach a
A Close Look at Google
Search Results
• Excerpt of page with your terms
• Matched terms in bold
• Which Google database used
• Approx. # of hits
• Terms actually searched on, as Dictionary links
• URL, size, date last crawled
• Link to Cached copy
• Pages supposedly like this one
• 2nd page from
same site
• All Google pages
from this site
Don’t believe the number of Results
They are approximate, changing, and not comprehensive
Default Matching on Search Terms

Default AND between terms

Google takes a FUZZY approach




only some of the words if a page is “important”
words may occur only in pages that link to the page
words occur somewhere on the site a page belongs to
Cached reveals the page as Google found it



may differ from the current page
Cached exists if a page is full-text indexed
 About 1 billion pages in Google are not cached
 Not fully searchable
no Cached if a page owner requests not to be cached
How Can You Know
Why Google Found a Page ?

Click Cache link toward end of results

top area often explains what was matched
Stemming
 Google
stems “when appropriate”
automatically detects word stem or root
 retrieves with various endings

kite flying gets kite kites kiting
fly flying, flyers, flyer’s, flyers’

to turn off
+kite +flying
“kite flying”

single word searches not stemmed
Words Google Does Not Search

Common or “stop” words ignored
to be or not to be



no list of “common” terms
Google tells you below search box in results
to turn off

+to +be +or not +to +be
“to be or not to be”
single word searches possible on common words
Ranking of Results

Word order matters




favoring phrases (words together)
looks for phrases with something in place of
stop words
word repetition and proximity also count
Google ranking is a great mystery

PageRank combines many factors



popularity - links to a page and their importance
“importance” - a value of 0 (low) to 10 (high)
term placement - phrases, proximity, repetition
See Cheat Sheet #1
Google Preferences

Interface language

Selected languages for pages
 SafeSearch filtering


Number of results returned


“moderate” is default
20 or 30 is best
Open new browser window for search
results
Back of Cheat Sheet #1
The Google Toolbar






Search any Google databases
Search within a site
Pop-up blocker
Search history list
Set Google preferences quickly
Customizable in Options
 download from
toolbar.google.com

Other browsers toolbar
 download from
googlebar.mozdev.org
Googling
Exercise 2
Installing the Google Toolbar
 Customizing Preferences

Taking Charge of Driving Google
OR
Getting the Most
from Google’s FUZZY Thinking
Improving Google’s
“FUZZY” Default AND

Problems with AND default:

words can occur anywhere in results pages




some pages may not contain all of your words
some may not have any of your words
Use quotation marks to require words together


may have different meanings or contexts
turns common words into unique search terms
“working mothers”
145,000
5% of
working mothers
2,680,000
“dry cells”
11,500
1% of
dry cells
1,010,000
Hyphen makes phrases and searches with and
without hyphens
bite-sized retrieves bite-sized, bite sized, bitesized
Force “FUZZY” with OR Searches

Singulars and plurals not covered by
stemming
parent OR parents

Equivalent or synonymous terms
parent OR guardian

Misspellings
libarian OR librarian

Apostrophes and their misuse
april's OR aprils OR april "fools day"
Ask Google to be “FUZZY”

Synonym search
~ immediately before a word

sometimes “thinks” of very broad, related terms
~food
~facts
~help


recipes, nutrition, cooking
information, statistics
guide, tutorial, FAQ, manual
Often: Terms appear in links pointing to a retrieved page
Take advantage of stemming
Let stemming handle variant endings:
“wild flowers” OR wildflowers hike “point reyes”
april OR may OR spring
hike, hikers,
hiking, hikes
Ask for “FUZZY” Number Ranges

Numrange search uses
. . (no spaces)
babe ruth 1921..1935
results have highlighted dates within this range
3..6 megapixels digital camera
most numbers will be associated with megapixels
DVD player $250..
can be open-ended -- any number above starting number
 The Whole-Word Wildcard:
Allowing FUZZY within “ ”

Can’t remember the exact wording in a phrase?
Who wrote something like, “The stag at night drank his
fill”?
Try searching:
“the stag * * * his fill” OR “the stag * * * * his fill”
ANSWER: “The stag at eve had drunk his fill” - in most sources
--Sir Walter Scott, “Lady of the Lake”

Construct proximity searches
Or
try GAPS
www.staggernation.com/cgi-bin/gaps.cgi
"george bush"
"george * bush"
"george * * bush"
"bush george"
"bush * george"
Excluding to Control “FUZZIness”
You want: Medical info about a pancreatitis
diet
 Start with: pancreatitis diet
172,000
 Eliminate undesirable words in results:

pancreatitis diet -cat -dog
132,000
pancreatitis -cat -dog -"support group"
128,000
Select exclusions carefully
Ask Google to be Very “FUZZY”:
Related & Similar

Two commands for the same function




click Similar at end of result
search related:www.infopeople.org
Sometimes hard to see how related

links to and from the target page

major words in and ranking of related pages
Possible uses

comparison shopping
find more sites like a site
related:www.econsumer.gov

use to evaluate a suspect page

Googling
Exercise 3

Taking Charge of Driving
Google
Googling
Limiting to Find
the Hard-to-Get
Limiting: Words in <Title>

intitle:

finds pages concentrated on your term
hybrid cars intitle:mileage
hybrid cars mileage

with quotes:
intitle:”cuban embargo”
“cuban embargo”

7,060
296,000
581
28,000
with OR:
intitle:”global warming” OR intitle:”greenhouse effect”

Use allintitle: to require all words in title
allintitle: hybrid cars mileage

86
can combine only with site:
allintitle: hybrid cars mileage –site:com
11
Exploiting a Page’s URL

Limiting to domain (edu, gov, etc):
site:edu OR site:gov OR site:ca.us

complete list at:
http://en.wikipedia.org/wiki/List_of_Internet_TLDs

Searching within a Site

site:
site:memory.loc.gov lincoln “sheet music”



works only in top/first part of URL
omit http:// and final /
makes Google into a search engine for pages that are indexed
in Google

inurl: less specific

term may be anywhere in URLs
inurl:lincoln “sheet music”

finds “lincoln” anywhere in any URL and “sheet music”
somewhere in the pages
Limiting to Types of Documents

filetype:

OR to find more than one
form 1040 filetype:pdf - finds forms

-filetype:

exclude certain filetypes
form 1040 -filetype:pdf - finds help with forms

View as HTML link can be useful


avoids viruses a document might carry if opened
allows viewing without the software or reader
Caveats for Limit Commands

Cannot always be combined



link: similar: must stand alone
allintitle: allintext: allinanchor: allinurl: with site: only
You can mix all other limit commands, usually:
inurl:ucla intitle:admissions statistics
intitle:”thyroid disease” site:edu OR site:com

Be careful not to ask for the impossible:
site:ucla.edu -inurl:edu
site:com site:edu site:gov

Some require understanding HTML hypertext links:

inanchor:links looks for text in link tags in the HTML code:
<a href="http://www.pancreasweb.com”>Pancreatitis links</a>
<a href="www.pancreaticdisease.com/links/links.htm”>Links</a>
See Cheat Sheet #3
Advanced Web Search page
Restricted Opportunities
Useful if you want to:
Not useful if you want to:


Try limiting to pages
updated in 3 mos, 6 mos,
year
 Change language of
results pages
 Select from list of filetype
formats
 Change content filtering
(also in Preferences)
I almost never
use it
Construct complex
searches



Use OR for more than one
limiter




OR with phrases
multiple phrases
site:
filetype:
inurl:
Use intitle: inurl:

only the allin... commands
in Advanced Search
Googling
Exercise 4

Limiting
Googling
Finding Info on a Subject
Finding Directories & Link Lists

EXAMPLE - looking for links or directories about:
“women’s history” “middle east”

Use words likely to occur in link-list or directory pages
links OR "directory of" OR guide “women’s history” “middle east”
“what’s new” OR “what’s cool” “women’s history” “middle east”

<Title> field limit to focus pages you want
intitle:links OR intitle:”directory of” OR intitle:”encyclopedia of”
“women’s history” “middle east”
intitle:”women’s history” intitle:directory “middle east”

Are there agencies or organizations with links on this topic?
inanchor:links society OR association
"middle east" "women's studies"
Be creative. Substitute database for “directory” to find searchable databases
Google’s Directory

1.5+ million pages (compare with 8+ billion in web search)

DMOZ Open Directory


Google “importance” ranking within directory
EXAMPLE:

women's history middle east OR eastern
Click on useful subject categories for more:
Science > Social Sciences > Area Studies > Middle Eastern Studies
Society > People > Women > Women's Studies > By Topic
Society > Issues > Human Rights and Liberties > Regional > Middle
East
Search Google for Weblogs

Current commentary, opinions, misc. musings



Google indexes “important” blogs frequently
more than most web pages
Thorough search impossible
blog OR weblog OR “web log” your subject words
inurl:blog OR inurl:weblog your subject words

If you know the software a blog is using:
“powered by blogger” your subject words
site:blogspot.com your subject words
“powered by geeklog” your subject words

Try searching the Google Directory
Search Google Groups for Info

Usenet news groups back to 1981




archive of UNevaluated public thoughts, advice &
opinions
some not found elsewhere
select threads with more than one article for context
Search differences:



search for a group by name
search within a group
+ required for common words even in “ “
“hair loss” OR "loss +of hair" OR balding
group:alt.support.thyroid


use Advanced Search to limit by group or date posted
Create new mailing lists with registration
Google as Encyclopedic Glossary

Use the command define:[no space]


Google finds and ranks Web pages with definitions
define:internet
define:due diligence
Or build searches for pages with definitions:
internet “what is”
“what is the internet”
“internet stands +for”
internet ~beginners
internet ~FAQ

Also many common facts available:
population of japan
currency in algeria
birthplace of hitler
Exercise 5
Finding Info on a Subject
 Brainstorming
How would you approach Google
7.
isto
the
ofof
Nepal,
and
how
1.
2.
4.
IHow
Where
wantcan
can
find
I find
Icurrency
find
websites
some
debates,
good
directing
from
collections
a me
wide
to of
range
good
links
places
ofand
3. What
blogs
about
California
and
the
5.
6.
birthplace
size
of
California?
of
Teddy
to
solve
each
the
following
much
of in
itproblems?
could
US
buy
asblogs
of a near-death
Roosevelt?
information
for
bird
watching
on
about
migraine
in$100
Northern
what
headaches?
constitutes
California.
useperspectives,
of
blogs
libraries,
particularly
to keep in
January 15,I'm
2004?
experience?
interested
proofs that
what
people
touch with other librarians
andinlibraries
in the
state
can be using
believed.
andreport
how they’re
blogs?

Googling
Special Google Databases
and Tools
Shortcuts and Services



Shortcuts:
 dictionaries and other definitions
 phonebooks - white and yellow
 movie showtimes
 stocks with recent news
 maps, weather
 converters, math problem calculators, physical constants
 number searches
 UPS, FedEx, USPS, VIN, UPC codes, area codes,
airplane reg. #, patents, more
http://www.googleguide.com/shortcuts.html
Translate
 click [Translate this page] or URL or enter text at
www.google.com/language_tools
Page Info - better to enter a URL @ alexa.com
Many search engines offer useful shortcuts & similar tools:
See Search Cheat Sheet #4 & Supplement
“Hacking” Google URLs

Structure of a Google search result URL

Your search is for:
“web searching” tutorial
http://www.google.com/search? Google URL ? indicates query
num=20&
Number of results per page
hl=en&
Interface language
lr=&
Search language blank (ALL)
safe=off&
SafeSearch off
q=%22web+searching%22+tutorial
Query search terms
%22 means quote mark
+ joins terms

Will vary according to your Preferences setting
 You
can modify results by changing values
A “Hack” for Country Searches

Type the search: egypt history 1950..1970
http://www.google.com/search?num=20&hl=en&lr=&safe=off&
q=egypt+history+1950..1975 &restrict=countryEG



Append in Address/URL box (no spaces):
&restrict=countryEG
General format - capitalized country code:
&restrict=countryXX
Complete country codes list:
http://en.wikipedia.org/wiki/List_of_Internet_TLDs

More countries and pages than in Language Tools
search page
www.google.com/language_tools
Google’s Other Proprietary Databases
Besides Web, Directory, and Groups

Images



News




Use Advanced Search forms
4,500 news sources
Useful, specific limit settings
30 days
international versions - other news slants
Froogle for shopping




1.3+ billion
SafeSearch filter only works in English language
shopping sites from Google - a subset
+ merchant uploads of catalogs not on the web
no fees, no pay for position
Catalogs (Google Labs still)


scanned mail-order catalogs (not web), text searchable
to navigate within a catalog, click an image and use the
special catalogs navigation bar
Local Information

local.google.com

“businesses & services” from Google web database +
several yellow pages




topic box
address/location box
restrict to 1, 5, 15, 45 miles away
geographic proximity, maps
 EXAMPLE:
vegetarian restaurants
100 Larkin St, San Francisco, CA

maps.google.com



draggable images, satellite view
local (yellow pages), driving directions
earth.google.com


requires download, 200 MB memory
exotic toy or useful tool?
Google Labs

More upcoming Google services (beta)





Print.google.com – search only in Print
database


Sets - create and explore sequences of things
Suggest - browse possible search terms
video.google.com – some TV programs
My search history – registration and privacy
considerations
project to make full text books available online
Scholar.google.com – special page to search
from

scholarly articles (mostly) on the web



abstracts if full text not available
integrated with OCLC for library holdings
integrated with some college campuses
See Cheat Sheet #5
Exercise 6
Where would you look?
1.
Choose ONE or TWO questions to answer
2.
Write down what you did & learned
3.
It’s O.K. to talk, ask questions, and help
each other as needed
Googling
When Google Doesn’t Work
Other Effective Search Engines

Yahoo Search (3+ billion)

no 10-word limit

accepts ( ) around Boolean OR
(“global warming” OR “greenhouse effect”)
(site:edu OR site:gov OR site:uk)


pay-for-position sites not identified
Teoma (1+ billion)

popularity within subjects

sometimes finds link collections as Resources
Bookmarklets for Searching
 Java
Script applications that reside in
your Bookmarks or Favorites (Favlets)
 Search engine tools:
 run
a search in another search engine
@Teoma @Yahoo!
 search
highlighted text in a search engine
 Information
and more about them at
searchengineshowdown.com/bmlets
Recommended Directories

By library people
LII.ORG
 Academic Info
 Infomine


Complement to searching
when search engines do not seem to
work
 when you know or have a hunch there
is a site about your question

Thinking in Sync with Search Engines

Search engine balancing act:

Do we agree with Google’s “importance”?




tyrannical or democratic?

favors established more than new websites

favors trendy, high-speed, consumer, vroom & zoom
Are Google’s secretiveness & fuzziness trustable?
Have search engines changed us?

Do we accept “good enough” quicker?

Have we given up “thorough” and “certain”?
Will semantic & linguistic analysis help?

Or bring in a new age of “whatever” thinking
Googling
Exercise 7

Make your own Cheat Sheet

Write down up to seven things you want to
remember to do or practice

Circle the ONE you like most
Googling
Workshop Evaluation
infopeople.org/WS/eval