Online Databases and the Internet

Download Report

Transcript Online Databases and the Internet

Internet Searches
Vendor
Large Computer

User
Communication
Connection
The Internet as a
Resource



No review of information, generally
You decide what is junk
Hard to find really useful
information (dilution problem)
Search Concepts

Precision (or relevance) the
fraction of the documents retrieved
in a search that are useful
Useful
All documents
Search Concepts

Recall - the fraction of useful
documents retrieved out of all
useful documents in the database
Problem


A search that is too narrow may
have high relevance (precision) but
low recall
High purity but low percent yield
Problem


A search that is too broad may
have high recall but low precision
(most of the useful documents are
retrieved but many irrelevant ones
are retrieved also)
Low purity with high percent yield
Searching On The
Internet


If you can find it here, you can find
it anywhere
Lots of dross, little gold for
scientists
Search Engines

“Spiders” create massive indexes
of words on the pages
Search Engines

Word 1 on page A
Word 2 on page A
Word 2 on page B
…
Word 1 on page K
...
Searching occurs on the index
A search on word 1 gives references
to pages A and K
Search Results



Summary information from some
source
link to the page
relevance rating
Summary Information


Excite - from the page but may be
from several sentences deep using
a keyword algorithm
may be irrelevant
Summary Information


HotBot - from the description tags
or from the top headings
may produce gibberish if page was
poorly done
Relevance Rating


count the total number of times the
search word occurs on the page
higher frequency gives higher
relevance
Relevance Rating


count the occurrence of the search
word in the first “n” words on the
page
assumption is that the word occurs
early in the document if it is the
main subject of the document
Advanced Searches

search on phrases requiring that
EVERY word of the phrase be
present on the page in the EXACT
ORDER entered in the phrase
Advanced Searches

Boolean
 and
(+)
 or
 not
(-)
General Tips


Avoid single search terms
For information on the Whitesox
use +baseball+Whitesox+”batting
average”
General Tips

Avoid common words as search
terms
General Tips


use and (+) to require words be on
the page
use not (-) to omit pages with
specified word on them
General Tips

require exact matching of all words
in a phrase
Evaluating Search Engines
http://webdir.euroseek.com/top/catid=5042/ilang=en
Choosing the Best Search Tool to
Start With

Directories and Virtual Libraries
 Contain
topical lists of selected resources, hierarchically
arranged
 Many directories rate and review Web resources
 Meant to be browsed, but can also be searched by
keyword
 Depend on people for selection and control of the
included resources
 Usually best for broad topics
 Updated less frequently than search engine databases
 Tend to have small databases
 Contain links to subject guides and specialized
databases
Best tool (cont)

Search Engines
 Attempt
to index as much of the Web as possible
 Most are full-text databases
 Require knowledge of search techniques to guarantee
good results
 Databases are created by computer programs called
spiders or robots
 Most often used for multifaceted topics and obscure
subjects
 Search very large databases that are updated
frequently
Search Tools (cont)

Meta-search tools
 Some
allow you to search several search engines
simultaneously
 Some supply lists of databases that can be
searched directly from their pages
 Provide a good way to keep up with new search
engines
 May not fully exploit the features of individual
search engines, so you must keep your search
simple
Search Tools (cont)

Intelligent agents
 Software
entities that perform tasks on your behalf
 Some can send the information requested on a
schedule that you define
 Sometimes referred to as bots or robots
 Agents work with some degree of autonomy
Web directory vs. search engine

Directory
Has a structure that can be browsed
 Editors generally decide where to list each site within the
categories


Search engine
You must search a search engine on any subject by using
keywords
 Usually run an automated program that continually scans
the web content, page by page, storing all the information
found along the way.
 The results are returned to you based on relevance (the
number of times your search words appear on any given
page)

Different uses of directories vs. search
engines



If you are looking for a specific site (know by
name) use a directory
If looking for a list of sites on the same subject,
use a directory.
If looking for a specific page within a site, use a
search engine
Yahoo




http://www.yahoo.com
Use help to find about
how to search
Has a directory which is
an online guide to the
WWW . It is created by
editors who visit and
evaluate web sites.
Yahoo’s directory can
also be browsed.
Lycos


http://www.lycos.com
Different pages to search for different information
AltaVista

http://www.altavista.com
Video Help
Google

http://www.google.com
Google Advanced Images Search
Google –Groups search
Google--directory
Search.com

What is new about Search.com?


Search.com uses SavvySearch metasearch technology to give
you the most comprehensive search results on the Internet.
Now you only have to search once for results from over 1,000
search engines, Web directories, auctions, storefronts, news
sources, discussion groups, reference sites, and more. In
addition, Search.com's metasearch channels give you the
power to get specialized results from dozens of search
engines dedicated to popular topics, like Music or Travel.
How does Search.com work?

Search.com sends your search query to several search
engines at one time and integrates the results into one list.
That means you don't have to spend time going to each
individual site to search — you can get all the Web's top
results at once from Search.com
Search.com

Can I search in specific categories?
course — just choose a metasearch channel from
our main page. Each of our channels searches
specialized engines related to that particular topic or
genre. For example, our Music channel gives you
the option to search for MP3s, lyrics, reviews, prices,
software, concert tickets, and more. You can select
as many categories as you like on a particular
channel. The more checkboxes you select, the more
engines we search.
 Of
Search.com

Understanding search results

At the top of search results pages, you'll see a list of the sites we searched to get
your results. (A green bullet means we successfully contacted the site; a gray
bullet means we did not get a response in time.) Below that list your results
appear, organized into groups of similar information. For example, if you do a
search on earthquakes, you'll see links to the earthquake category pages found
on human-built Web guides like Yahoo! and Snap under the heading Directories,
earthquake sites under Web Pages, earthquake news under Headlines, and
prices on earthquake books, videos, and other products under Latest Prices.
If you'd like more results within a certain grouping — Web pages, for example —
just click the link that says See more Web pages results. On that page, your
results will be grouped according to which site they came from. If you want to see
more results from a particular site, click the link that says See all matching
results from the site you're interested in. And if you didn't find what you were
looking for, you can always perform another search from the search box at the
bottom of the page, or choose another channel to search from. Results you get
on our search channels are organized similarly, based on which checkboxes you
select before you perform your search.

Search.com

Advanced search tips

Some engines that Search.com queries support the following
advanced search options:
• double-quoted phrases: " " (e.g., "john lennon" not john
lennon)
• enforced term operators: +/- (e.g., music +"john lennon" beatles)
• boolean language: and, or, not (e.g. music and "john lennon"
and not beatles)
You may improve your results through these advanced search
techniques, but many engines do not support these options, so
sometimes your results may appear to be exactly the same as
they would be without them. Try out a few different approaches
and see what works best for you.
Virtual Libraries



Directories that contain collections of resources that librarians and
information specialists have carefully chosen and have organized in a
logical way
Smaller than directories since resources are carefully selected
Examples
 The Argus Clearinghouse


INFOMINE


http://www.ipl.org
Librarians’ Index to the Internet


http://infomine.ucr.edu
The Internet Public Library


http://www.clearinghouse.net
http://lii.org
WWW Virtual Library

http://www.vlib.org
Three Major Types of Information
in a Virtual Library

Subject Guides (meta-pages)
 WWW
resource that is devoted to including
hyperlinks of most Web pages on a particular
subject

Reference Works
 A full-text
document with self-contained
information

Specialized Databases
 An
index that catalogs certain material. They are
searched by keyword
More Information on
Search Engines


Tools that use computer programs called spiders or robots to automatically gather information on the
Internet to create databases.
Search Engines

AltaVista

Excite


http://www.excite.com

Google

HotBot


http://www.google.com
http://hotbot.lycos.com

Infoseek

Lycos


http://infoseek.go.com
http://lycos.com

Northern Light

Webcrawler



http://www.altavista.com
http://www.northernlight.com
http://www.webcrawler.com
Types of searching


Boolean –AND, OR, NOT
Phrase—use “ “ around the phrase—words appear adjacent to each other in the document
Meta-Search Tools


Allow you to search several search engines simultaneously
Some Meta-Search Tools

All-in-One Search Page


Dogpile


http://www.profusion.com
SavvySearch


http://www.metacrawler.com
ProFusion


http://www.fossick.com
MetaCrawler


http://www.dogpile.com
Fossick.com


http://www.allonesearch.com
http://www.savvysearch.com
SEARCH.COM

http://www.search.com
Intelligent Agents




Software entities that gather information or accomplish tasks
without your immediate presence.
You tell the agent what action you want performed and the
agent then executes them on your behalf.
They can perform repetitive tasks of searching databases,
retrieving and filtering information, and delivering
information back to you.
Example

MyPage from Go Network
You access services, personalize it and use your username and
password you create to get into information
 You set up the agent that search continuously for information on
your topic and it alerts you when it finds something.
 You can leave it working and come back a week later to see what
it has turned up.

Domain types






.edu
 Educational (can be anything from university research to a student’s
or faculty member’s home page
.gov
 Governmental (usually contains reliable data)
.com
 Commercial (may be trying to sell a product)
.net
 Network (may provide services to commercial or individual
customers)
.org
 Organization (usually created by a nonprofit institution; may be
trying to persuade the reader; may be biased
.mil
 United States military sites, agencies, and some academics
10 Basic Steps to a Good Search
Strategy





Identify the important concepts of your search
Choose the keywords that describe these
concepts
Determine whether there are synonyms, related
terms, or other variations of the keywords that
should be included.
Determine which search features may apply,
including truncation, phrase searching, Boolean
operators, etc.
Choose a search engine
10 steps (cont)





Read the search instructions on the search engine’s home
page. Look for sections entitle “Help”, “Advanced Search”,
“Frequently Asked Questions (FAQ)”, etc.
Create a search expression using syntax that is
appropriate for the search engine
Evaluate the results. How many hits were returned?
Were the results relevant to your query?
Modify your search if needed. Go back to Steps 2 through
4 and revise your query accordingly.
Try the same search in a different search engine, following
steps 5 through 9.
Guidelines for Evaluation of
Internet Resources

Who is the author or institution
If the resource was written by an individual, does it offer or give
links to biographical information about the author
 If an institution produced the resource, does it provide links to
information about itself, including its purpose, history, and street
address
 Have you seen the author’s or institution’s name cited in other
sources or bibliographies
 If the page is part of a larger institution’s Web site, does the
institution appear to filter the information that appears at its site
 What clues does the URL give you about the source’s authority.

A ~ indicates a personal page
 Pay attention to the domain name

Guidelines (cont)

Who is the audience
 Is
the Web page intended for the general public, or is
it meant for scholars, practitioners, children, and so
forth
 Is the audience clearly stated
 Does the Web page meet the needs of its stated
audience

What is the purpose of the information
 Is
the purpose of the information to inform, explain,
persuade, market a product, or advocate a cause
 Is the purpose clearly stated
 Does the resource fulfill the stated purpose
Guidelines (cont)

Is the content accurate, objective, and supported by
other sources
 Are
there political, ideological, cultural, religious, or
institutional biases
 Is the content intended to be a brief overview of the
topic or an in-depth analysis?
 If the information is opinion, is this clearly stated
 If there are facts and statistics included, are they
properly cited
 If it clear how the data was collected, and is it
presented in a logical, organized way
 Is there a bibliography at the end of the document.
Guidelines (cont)

How current is the information
 Does
the Web page have a date that indicates
when it was placed on the Web
 Is it clear when the page was last updated
Citing a Web Reference

There are three majors styles

APA (American Psychological Associates)


MLA(Modern Language Association)



http://www.apa.org/journals/webref.html
http://www.mla.org/style/sources.htm
Chicago Manual of Style
Agreements of citation
Author’s name
 Document title
 Title of larger or complete work; if relevant
 Date of publication or last revision
 Date page was accessed
 URL
