2003-03-china-retrie.. - Vrije Universiteit Brussel

Download Report

Transcript 2003-03-china-retrie.. - Vrije Universiteit Brussel

1
Information retrieval
[email protected]
• Vrije Universiteit Brussel
• Information- and Library Science, University of Antwerp(en),
Belgium
Lectures presented in universities in China, March 2003.
These slides are available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
2
Contents /
summary
of this
presentation
1. About “information”
2. Databases and
computerized
information retrieval
3. Classifications, and
thesaurus systems
4. Internet
5. World-Wide Web
6. Online access
information
sources and services!
3
****
About “information”
Information concepts
4
***-
Our world:
future trends
Future trends in our world
Answers / Requirements /
Solutions / Reactions
• Complexity 
• Knowledge and skills 
• Dynamics and evolution  • Adaptability 
Speed and acceleration 
Flexibility 
• Internationalization 
Globalization 
• Global co-operation 
Mobility 
• Economic products less
• Education, research,
based on natural resources
exploitation of knowledge
and more on “knowledge”
is important
5
***-
!? Question !?
Compare “information”
for instance with “bananas”.
6
***-
Information versus other products
= bits versus atoms
• The essential difference between information and other
economical products or natural products is that
information on computers
(such as databases)
01010101101011010010
consists of bits (and bytes),
while other economic / natural products
(such as bananas)
consist of atoms.
• This has many interrelated consequences.
7
***-
Information:
some strange properties (Part 1)
• Information is never consumed and does not deteriorate.
However, nevertheless information becomes obsolete;
speed of delivery can be crucial. The context is important.
• There is no agreed measure of a unit of information.
• The price of an information item is not well linked to its
value in a particular situation.
Moreover, one cannot well quantify the benefit/value of
information.
8
***-
Information:
some strange properties (Part 2)
• One information item can be available to different
persons at the same time.
Information can be well reproduced, which makes it
cheap for wide consumption.
However, copyright can keep the price high.
• Most digital information items (documents) can be
changed, modified, falsified, manipulated… easier than
physical products/items.
”Is this document real, authentic, original?”
9
***-
Information sources:
people and documents
• Information sources come essentially in two formats:
»less formal: people communicating by
—telephone
—electronic mail,…
»more formal: documents such as
—hard copy documents
—electronic, digital documents; computer-based files
• Here we focus mainly on information that is stored in
documents.
10
****
The flow of documentary information
with primary and secondary sources
Author /
Creator /
Sender
Primary sources / systems: mainly
Journal articles / Books /
Electronic mail / Online sources /...
Secondary sources / systems: mainly
Reference works (printed, CD-ROM, online)
Library catalogues, including OPACs...
Reader /
User /
Receiver
11
****
The role of secondary information
sources
• The secondary information flow is generated on the basis
of the primary flow, mainly because the great amounts of
primary information lower the chance to retrieve and use
the appropriate information item.
• Secondary information tries to bring some order in the
great chaos.
12
****
Various categorisations of
documentary information sources
Information sources can be categorised in various ways.
For instance:
•Books
•Primary
•Text
•Hard copy /
•Image
not digital
•Sound
•Animation/
•Digital
video
•Offline
•Software
•Serials
•Secondary
•Data
•Online
•Interactive
13
****
Retrospective searching versus
current awareness: scheme
Retrospective searching
Past
Now
Current awareness
Future
14
****
Information retrieval: evolution of
storage and distribution media
• 1450
printing with reusable characters/fonts
• 1975
+ online access databases
from the 1970s
growing Internet
• 1985
+ CD-ROM
• 1990
+ World-Wide Web
(based on the Internet)
15
****
Information retrieval:
end user or information intermediaries
End-user
Information intermediary
(Broker or library or ...)
Information
16
****
End user versus information
intermediary
• People can retrieve information themselves, directly as socalled “end-users”.
• However,
»the information landscape is complex,
»it may cost a lot of the time to find the right information,
»it may be costly to search for information
• Therefore it may be wise to obtain the assistance of an
expert information intermediary, such a a reference
librarian or an information broker.
17
****
About “information”
Evaluating information sources
18
****
Documentary information sources:
evaluate their quality
• We should always be critical when using information
sources, in view of
»the widely varying degrees of quality of information
sources, and of
»the costs associated with searching, finding, using
information.
19
****
Documentary information sources:
criteria to evaluate their quality (1)
• Is the information valid, reliable, trustworthy, genuine,
authentic?
Is the author honest?
Is the source objective, not subjective, without cultural or
political or ideological or commercial bias?
Is the origin an individual or a company or an
organisation?
Is the publication sponsored by some company or
organisation?
20
****
Documentary information sources:
criteria to evaluate their quality (2)
• Is the information accurate, correct?
Who is the author or producer?
Has the source an author or a producer with a high
expertise, a good reputation, good qualifications?
Can the author be contacted for clarification or
discussion?
Was the information reviewed, edited, improved,
corrected, censored, approved, verified, before
publication?
Do experts agree on the information provided?
21
****
Documentary information sources:
criteria to evaluate their quality (3)
• Is the information source unique?
Does it offer a great amount of primary information,
which is not obtainable from other sources?
• Is the information complete?
Is the work available in its entirety?
• Does the source offer a wide coverage?
Is the source comprehensive, substantive?
• Is the information current enough, up to date?
Is a publication date provided?
Is an expiration date provided?
22
****
Documentary information sources:
criteria to evaluate their quality (4)
• Does the document provide suitable references, so that
you can verify statements and find older suitable
information sources?
• Good clear format and lay-out of the information /
User-friendly information system /
Easy for users to orientate themselves within the resource
and to find their way around it?
• Good user support / Good customer support?
• Is the type of distribution medium appropriate?
(print, e-mail, online,...)
23
****
Documentary information sources:
criteria to evaluate their quality (5)
• Is the information what you want?
If not, then reassess your needs and consider other types
of information as well.
24
****
Documentary information sources:
criteria to evaluate their quality (6)
• Is the information suitable for your level of
understanding of the subject?
Is the document popular, suitable for the general public,
for students, for professionals, for scholarly/academic
use…?
Doest it report new, primary research (survey,
experiment, observation, measurement, invention) or is it
a review of sources published earlier?
• Does the information repeat or confirm what you already
know, or is it complementary, contradictory, new?
25
****
About “information”
Computer- and network-based information
26
****
Information: from bits
to meaningful information
Digital
computer data
= bits
01
Information = “documents”,
meaningful for and
to be interpreted by
human beings
or
Program code,
meaningful for and
to be interpreted / executed by
a suitable / compatible computer
27
****
Information: digitally stored and
managed information
Categories of digital, computer readable
information / data, forming electronic “documents”,
understandable by human beings.
text
numbers
images
video
+ sounds
01
multimedia
28
****
Information:
types of digital information
Linear text
Hypertext
Sound
Static images
Video
Multimedia / Hypermedia
Programs for computers
Digital information
01
29
****
Some publication media
compared
Update
speed
Online / Networked
Printed
CD-ROM
Volume
30
***-
Publications on CD-ROM or online:
advantages compared with hard copy
• Can be cheaper to produce, to transport and to store.
• Can offer better search features.
• Can offer various output formats.
J
• Can offer fast and efficient “copy and paste” by the
reader/user of information to other documents.
Taken together, these features allow more efficient access to
large, high volume documents or databases.
31
****
Scientific publishing in Utopia:
an ideal scheme
Many authors
author = reader
in science
Many editors / publishers
Online remote access multimedia database server
one global ,
international computer
data communication
network
Many database search clients
and user interfaces
Many readers / users
32
****
!? Question !?
Indicate the differences
between reality
and that simplified, ideal scheme
of the information flow.
33
****
!? Question !?
Which basic problems/difficulties
hinder people
to find / access / use information?
34
****
Information retrieval:
basic difficulties
(Part 1)
• In many cases it is not completely clear to the user of an
information retrieval system which information is in fact
needed, required.
• In many cases the need for information cannot be
expressed completely in the form of a query.
One of the reasons is that the complete context of the
information need should ideally be expressed, including
the knowledge and background of the searcher.
35
****
Information retrieval:
basic difficulties
(Part 2)
• Computer systems are artificial, but nevertheless most
use human language in their interface with the human
users, for instance in database search systems.
This may cause difficulties related to language and
vocabulary in particular. Some examples:
• People use different languages and different terms
(vocabularies) to describe a similar concept.
• Concepts, vocabularies and meanings of words and terms
may change over time.
• Meanings of words / terms may depend on their context.
36
****
Information retrieval:
basic difficulties
(Part 3)
• Many different and imperfect retrieval systems should or
must be used.
»To retrieve and access the information that is in principle
available, many different retrieval systems must be
available and be mastered.
»Furthermore, a perfect information retrieval software does
not (yet) exist; scientific and technological evolution is fast
in the domain of information retrieval software since about
1970.
37
****
Information retrieval:
basic difficulties
(Part 4)
• Information overload
Users are often overwhelmed
by the amount of available information and
by the large influx of new information.
38
****
Information retrieval:
basic difficulties
(Part 5)
• The price (or inaccessibility) of particular information
A lot of information cannot be obtained or at least not free
of charge.
39
***-
Information retrieval:
browsing and searching as methods
• To make information available, the producer of an
information system can offer to the user basically two
different ways for retrieval of the right information from
the system:
»by browsing or
»by searching.
40
***-
Information retrieval:
browsing versus searching
• Browsing a logically
ordered list of terms
• Searching by submitting a
search term to the system
• Logical order /
Sorted by subject
• Alphabetical order /
Not sorted by subject
• Table of contents
• Alphabetical index
• Classification
• Thesaurus
• Hypertext-Hypermedia:
jump from a page
to a linked page
• Hypertext-Hypermedia:
search built in a page
41
***-
Information retrieval:
browsing systems
• In browsing systems, the user can follow some of the
paths offered by the system.
• The information is ordered, according to subject for
instance.
• The user does not have to use his own words to indicate
his needs.
• To support organising and browsing of information items,
some type of classification is applied in many cases.
42
***-
Information retrieval:
examples of browsing systems
• Examples of browsing systems are
»a table of contents in the front part of a book,
»a set of books placed on shelves according to some
classification system,
»a hypertext hierarchical directory on the WWW, or more
generally all hypermedia systems.
43
***-
Information retrieval:
search systems
• In search systems, the user has to express his need for
information by formulating a query that is normally
using a natural language or a more formal language.
• In this case the information is normally not ordered
according to some logic, but in most cases in the form of a
well structured compilation of items of a similar form, in
the form of the records of a database when a computer
system is applied.
44
***-
Information retrieval:
examples of search systems
• Examples of search systems are
»the index (the register) in the back part of a book,
»a library or museum catalogue with a search interface,
»a search form on a web page.
45
***-
Information retrieval:
pro and contra of browse systems
J Advantages:
»Browsing is relatively easy for the user.
L Difficulties for the user:
»Allows the user to explore the information space by roads
constructed based on the view of the world of the system
designers, and not based on his own view.
L Difficulties for the producer:
»It is relatively costly to construct an information system
based on browsing.
46
***-
Information retrieval:
pro and contra of search systems
J Advantages:
»Creation of keyword indexes for fast searching is relatively
simple and cheap and can be automated.
L Difficulties for the user:
»Searching is hindered by vocabulary / language problems.
»The users cannot always fully articulate their needs.
47
****
The information industry and the
information market
The components of the information industry
48
****
The components of the
information industry
• Authors
• Publishers
• Distributors
• Users
• Related organizations
49
****
The information industry and the
information market
Overview and evolution
50
****
Increase in the number of scientific
and technical serial publications
1000000
100000
10000
1000
100
10
1
1650 1700 1750 1800 1850 1900 1950 2000
51
****
The information market:
growth in the database industry
10000
Number of
living
databases
8000
6000
4000
Number of
database
producers
2000
Number of
vendors
0
1975
1980
1985
1990
1995
Source: Williams, in: Gale Directory of Databases, 1998.
52
****
The information industry / market:
future trends
(Part 1)
• Growth in the production of databases.
• Less analogue / hard-copy production
= more digital production, storage, and distribution of
information.
• More integration of information types
into multimedia and hypermedia.
53
****
The information industry / market:
future trends
(Part 2)
• Growth in the number of
»producers and distributors,
»end-users searching databases
due to
easier use
and
lower costs of information technology
54
****
Databases and computerized
information retrieval
Introduction
55
****
What is a
database?
A database is a collection of similar data records stored in a
common file (or collection of files).
56
****
Types of databases:
examples
Examples: The databases that form the basis for
»catalogues of books or other types of documents
»computerized bibliographies
»address directories
»a full text newspaper, newsletter, magazine, journal
+ collections of these
»WWW and Internet search engines
»intranet search engines
»...
57
***-
Information retrieval
and related activities: figure
Information management
Information retrieval
Text retrieval
Image retrieval
Presentation of
information
58
***-
Information retrieval:
via a database to the user
Information
content
Linear file
Inverted file
Database
Search engine
Search interface
User
59
***-
Information retrieval:
the basic processes in search systems
Information
problem
Text
documents
Representation
Query
Evaluation
and
feedback
Representation
Indexed documents
Comparison
Retrieved, sorted documents
60
***-
Information retrieval systems:
many components make up a system
• Any retrieval system is built up of many more or less
independent components.
• These components can be modified to increase the quality
of the results more or less independently.
61
***-
Information retrieval systems:
important components
the information content
system to describe formal aspects of information items
system to describe the subjects of information items
concrete descriptions of information items
= application of the used information description systems
information storage and retrieval computer program(s)
computer system used for retrieval
type of medium or information carrier used for distribution
62
***-
What determines the results of a
search in a retrieval system?
• the information retrieval system
( = contents + system)
Result of a search
• the user of the retrieval system
and the search strategy applied to the system
63
***-
Databases and computerized
information retrieval
Text retrieval and language
64
***-
Text retrieval and language:
a word is not a concept (a)
Problem:
A word or phrase or term is not the same as a concept or
subject or topic.
L
Word
Concept
Word
65
***-
Text retrieval and language:
a word is not a concept (a’)
So, to ‘cover’ a concept in a search,
to increase the recall of a search,
the user of a retrieval system should consider an
expansion of the query;
that is:
the user should also include other words in the query to
“cover” the concept
L
66
***-
Text retrieval and language:
a word is not a concept (a’’)
»synonyms!
»narrower terms, more specific terms
(such as particular brand names);
including terms with prefixes
(for instance: viruses, retroviruses, rotaviruses,...)
»spelling variations
(such as UK English versus US English);
possible variations after transliteration
L
67
***-
!? Question !?
Which problems in text retrieval
are illustrated by the following sentences?
L
68
***-Examples
Time flies like an arrow.
Fruit flies like a banana.
?
69
***-Examples
Time flies like an arrow.
Fruit flies like a banana.
70
***-Examples
Time flies like an arrow.
Fruit flies like a banana.
OK!
71
***-
Text retrieval and language:
ambiguity of meaning (a)
• Problem:
A word or phrase can have more than 1 meaning.
Ambiguity of the meaning of a word is a problem for
retrieval.
This decreases the precision of many searches.
The meaning can depend on the context.
The meaning may depend on the region where the term is
used.
L
72
***-
Text retrieval and language:
ambiguity of meaning (a’)
»Example:
—Pascal the philosopher
—Pascal the computer language
L
73
***-
Text retrieval and language:
ambiguity of meaning (a’’)
Problem:
Ambiguity of meaning
may be the cause of low precision.
Concept
Word
Concept
L
74
****
A word is not a concept
A concept is not a word
Word
Concept
Word
1 word or term does/can not “cover” a concept
= a concept cannot be “covered” by only 1 word or term;
this may be the cause of low recall.
75
****
A word is not a concept
A concept is not a word
Concept
Word
Concept
Ambiguity of meaning
may be the cause of low precision.
76
***-
Text retrieval and language:
conclusions
• The use of terms and language to retrieve information
from databases/collections/corpora causes many
problems.
• These problems are not recognized or underestimated by
many users of search/retrieval systems
= The power of retrieval systems is overestimated by
many users.
• Much research and development is still needed to enhance
text retrieval.
77
****
Databases and computerized
information retrieval
Hints on how to use information sources
78
****
Hints on how to use information
sources: overview
(Part 1)
• Know the purpose and motivation for each search.
• Do not be lazy: search on your own, before bothering
experts with requests for advice.
• Plan your search in advance.
• Choose the best source(s) for each search.
• Use the right tools for each job (a suitable communication
program for instance, in the case of online searches).
• Do not focus on a single source.
79
****
Hints on how to use information
sources: overview
(Part 2)
• Consider citation indexes besides subject-oriented
databases, as useful secondary information sources.
• Use the available tools for subject searching well.
• Try to cope with the language problems.
• Match your search strategy with the type of source.
• In computer-based retrieval systems, combine search
terms when appropriate, using
»Boolean operators
»proximity operators (for instance “near”,...)
80
****
Hints on how to use information
sources: overview
(Part 3)
• Work cost-effectively.
• Use special care when searching for names.
• Work iteratively.
• Keep a record of your work.
• Be critical: not all information is correct or useful.
• Stop searching when “enough is enough”
• Give up if necessary... (Not all questions have an answer.)
• ...
81
****
Hints on how to use information
sources: subject searching
• When you search for information on a particular
topic/subject: investigate if the database producer offers
»a subject classification scheme and/or
»a controlled/approved/accepted subject terms, and/or
»a subject thesaurus
• Exploit these, if they are available.
• In most cases you should find and use
synonyms and narrower terms
• Use broader and /or related terms, if appropriate.
82
****
Hints on how to use information
sources: Boolean combinations (1)
Most text search systems understand the basic
Boolean operators:
AND
= obtain records that contain both search
terms
OR
= obtain records that contain one or both
search terms
NOT
= exclude records that contain a search term
83
****
Hints on how to use information
sources: Boolean combinations (2)
Most text search systems understand the basic Boolean
operators typed in capital characters:
OR
AND
84
****
Hints on how to use information
sources: Boolean combinations (3)
In the case of computer-based information sources, use
Boolean combinations of search terms when appropriate
and when possible.
term x1
term y1
term z1
OR
OR
OR
term x2 AND term y2 AND term z2
OR
OR
OR
term x3
term y3
term z3
AND ...
85
****
!? Question !? Task !? Problem !?
How many (and which) concepts
do you see in a search for
“general reviews
about
monitoring seawater pollution that is due to effluents”?
86
****
!? Exercise !? Task !? Problem !?
Prepare off-line, on paper, a suitable search query
in a generic format, to find
“general reviews
about
monitoring seawater pollution that is due to effluents”
as the basis for later, concrete searches in databases.
(Limit yourself to 1 of the concepts.)
***-Example
Hints on how to use information
sources: example of a search query
Example: Searching for the concept “sea” can or should
involve the for instance the following words in a Boolean
OR combination:
baltic OR bay OR bays OR coast OR coastal OR coastline
OR coasts OR cove OR coves OR gulf OR mangrove OR
mangroves OR marine OR mediterranean OR noordzee OR
noordzeekust OR noordzeekusten OR ocean OR oceanic OR
oceans OR reef OR reefs OR “saline-freshwater interface”
OR sea OR seas OR seashore OR seawater OR seawaters
OR shore OR shores
87
88
****
!? Question !? Task !? Problem !?
What did you learn
from the exercise
on the formulation of a query?
89
****
Hints on how to use information
sources: work iteratively
Work iteratively =
search, investigate your results, refine your search, search
again, and so on;
do not try to find everything in 1 step, with 1 search.
Query
Feedback
Results
Searching
90
****
“The ability to ask the right question
is more than half the battle of finding the answer.”
Thomas J. Watson
?
91
****
Hints on how to use information
sources: when to stop searching?
Develop a feel for the “curve of diminishing returns”:
If you spend too much time, effort, and/or money
with too few benefits, you should stop.
payoff
Time to stop?
time / effort / money
92
****
Knowledge organisation:
classifications, and thesaurus systems
Introduction
93
****
Knowledge organisation:
introduction
• To organise knowledge / documents / books / reports /
information / data / records / things / items / materials
for more efficient storage and retrieval, some related,
similar tools / systems / methods /approaches are used.
• Often but not yet always, this process is assisted by a
computer system.
• Good systems are expanded and updated when the need
arises.
• The organization system applied should ideally be clearly
and immediately visible or even searchable on computer,
by the user of the materials.
94
***-
Knowledge organisation:
some tools
• Various tools / systems / methods / approaches are
available:
»Classification
»Taxonomy
»Thesaurus
»Ontology
»…
95
****
Knowledge organisation:
classifications, and thesaurus systems
Classifications
96
***-Examples
Classification systems:
introduction
• Classification systems
present the subjects in a
logical order, usually going
from the more general to the
more specific.
****Examples
Classification systems:
examples of universal systems
• Universal means here: covering all subjects
• Not just one but several competing systems exist.
Examples
»Universal Decimal Classification = UDC
used mainly outside U.S.A.
»Dewey Decimal Classification = DDC
used mainly in U.S.A.
»Library of Congress Classification
used mainly in U.S.A.
»...
97
98
****
Knowledge organisation:
classifications, and thesaurus systems
Thesaurus systems
99
****
Thesaurus:
description
• Thesaurus (contents) =
»system to control a vocabulary
(= words and phrases + their relations)
»the contents of this vocabulary
• Thesaurus program =
program to create, manage, modify and/or search a
thesaurus using a computer
100
****
Thesaurus
relations
Term(s) with broader meaning
BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning
101
***-
Thesaurus systems that cover all
subjects
• General systems
• Universal systems
• Covering all subjects
• Broad and shallow systems
• Horizontal systems
***-Examples
Thesaurus systems that cover all
subjects: examples
• thesaurus system built into word processing software
• Library of Congress Subject Headings (LCSH)
• thesaurus system that runs on a pc;
see for instance http://www.wordweb.co.uk/free/
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html
»http://thesaurus.plumbdesign.com/
102
103
***-
!? Exercise !? Task !? Problem !?
Practice using a general thesaurus system
that is built in your program for word processing.
104
**--
!? Exercise !? Task !? Problem !?
Have a look at
various global, general, universal thesaurus systems.
Consider which ones may be useful
for your future online information searches.
105
****
Computer networks,
data communication and Internet
Introduction
106
****
Data communication:
a definition
• Interpersonal communication
» Telecommunication
—Broadcast
—Telephone
—Data communication
–Remote login
–File transfer
–Hypertext transfer
–Electronic mail
–...
107
****
Data communication:
which types of ‘data’?
Linear text
Hypertext
Sound
Static images
Video
Multimedia / Hypermedia
Programs for computers
Digital information
01
108
****
Data communication:
which types of ‘data’?
• The same types of data (information) that can be stored
and managed on a computer can be transferred over
computer networks to one or several other computers.
• So the networks form an important extension of the
stand-alone computers.
• “The network is the computer”
109
****
Data communication:
applications
• Hard-copy transfer (Fax)
• Online use of the processing power of a remote computer
• Online access to information sources !
»library catalogues,
»bookshop catalogues,
»publisher’s catalogues,
»campus-wide and community information systems,
»(text or multimedia) databases,
»network-based journals, ...
110
***-
Data communication:
problems, difficulties, limitations
• Low transfer speed
L
• Technical complexity
111
****
Computer network protocols:
definition
• When 2 computer systems communicate via network,
they do that by exchanging messages.
• The structure of network messages varies from network
to network.
• Thus the message structure in a particular network is
agreed upon a priori and is described in a set of rules,
each defined in a protocol.
112
****
Computer networks,
data communication and Internet
National Wide Area Networks
113
****
National
Wide Area Networks
• Public access national packet switching networks
• Research computer networks
• Public access made available by
Internet Service Providers
• ...
114
****
Computer networks,
data communication and Internet
International computer networks
****Examples
International computer networks:
examples
• National public data communication networks linked
together
• Internet
• FidoNet
• Bitnet / EARN
• Usenet
• ...
115
116
****
Computer networks,
data communication and Internet
The Internet data communication network
117
****
The Internet
data communications network (Part 1)
• “Internet” is not well-defined.
• A network of smaller networks:
The global collection of interconnected local area,
regional and wide-area (national backbone) networks
which use the TCP/IP suite of data communication
protocols.
@
118
****
The Internet
data communications network (Part 2)
• Links computers of various types.
• Is constantly growing.
• The analogy of a superhighway has been used to describe
the emerging system of networked computers.
• The Internet has no owner, and is not managed by one
organization.
@
119
****
The Internet:
access from your Local Area Network
Your microcomputer
Local Area Network (LAN)
One of the national networks
The global Internet
120
****
Host computers in the Internet:
definition
• A host (computer) is a domain name that has a unique IP
address record associated with it.
• Could be any computer connected to the Internet by any
means.
• For instance:
www.vub.ac.be
@
121
****
Transmission Control Protocol /
Internet Protocol (TCP/IP)
• the main suite of transport protocols used on the Internet
for connectivity and transmission of data across
heterogeneous systems
• “glue that holds the Internet together”
• an open standard
• available on most Unix systems, VMS and other
minicomputer systems, many mainframe and
supercomputing systems and some microcomputer and
PC systems
122
****
Internet: growth in number of hosts
worldwide: linear plot
20000000
15000000
10000000
5000000
0
1993
January of each year
1994
1995
1996
1997
1998
123
****
Internet Service Provider
= ISP
Internet Service Providers provide their clients access to
Internet + in many cases
»an email address / server
»space for a web site
»software tools to start
»training
»technical support
»an accessible location for a WWW site of the client
»assistance with WWW site design and promotion
124
****
World-Wide Web = WWW
Introduction
125
****Example
The WWW:
example of a welcome page
126
****
URL =
Universal Resource Locator
• = draft standard for specifying an object on the Internet
• the structure is in most cases
protocol://computer_address[/path_name/file_name]
• examples:
»telnet://biblio.vub.ac.be
»ftp://ftp.vub.ac.be/
»gopher://gopher.vub.ac.be/
»http://www.vub.ac.be/BIBLIO/index.html
»news://news.server.edu/comp.infosystems.www
127
****
URL
format / structure
1. The first part of a URL, before the colon “:”, specifies the
access method = protocol
2. The second part of the URL, after the colon “:”, is
interpreted specific to the access method.
In general, two slashes after the colon
indicate a machine /computer name.
128
****
!? Question !? Task !? Problem !?
What is the difference between
Internet and the World-Wide Web?
129
****
The WWW is an application of
Internet
• The World-Wide Web (WWW) is a service, an application
of Internet.
• It is based on the Internet infrastructure.
• So the WWW is newer than the Internet.
The concept of the WWW was created at the end of the
1980s when the Internet was already well established.
130
****
The WWW is an application of
Internet: scheme
Data communication
Internet
WWW
131
****
The WWW:
the essential elements
• Information delivery and access using
hypertext/hypermedia documents/objects
»html documents
»http protocol:
http clients
http servers
• Integration of protocols in the Internet:
»http servers offering html documents including links to
other http servers,
telnet servers, ftp servers, nntp servers, gopher servers, ,...
132
****
World-Wide Web = WWW
WWW client programs
133
****
WWW:
client / browse programs
• To access the WWW, you run a browser program.
• The browser reads documents, and can fetch documents
from other sources. Information providers set up
hypermedia servers which browsers can get documents
from.
• The browser can display hypertext documents.
Hypertext is text with pointers to other text. The browsers
let you deal with the pointers in a transparent way:
select the pointer, and you are presented with the text that
is pointed to.
134
****
WWW: examples of
browsers for your own computer
Browsers are available for many computer platforms;
in particular:
browsers for Windows + Winsock:
»Netscape
»Microsoft Internet Explorer
»...
135
****
!? Question !? Task !? Problem !?
Browse the WWW,
using an available
browser client program.
136
***-
!? Question !? Task !? Problem !?
What came first: Internet or WWW?
Explain.
137
****
World-Wide Web = WWW
Saving information from a web
138
****
WWW: How to save information
from a web?
Information displayed by your web browser/client program
can be saved,
• by select, copy, paste in another document (and save)
• by saving a complete page to your disk
»in separate files
(for instance 1 HTML file + some image files)
»in 1 file, using Microsoft Internet Explorer 5 or a later
version
• by copying the information into an e-mail message that
you send to your own e-mail account
139
****
!? Exercise !? Task !? Problem !?
Copy some text fragment from WWW
and paste it into another document
on your computer.
140
****
!? Exercise !? Task !? Problem !?
Save a text from WWW
to disk, as HTML,
using a browser program.
141
****
!? Exercise !? Task !? Problem !?
Display an HTML file
that you have saved
from the WWW to your disk,
in a program for word processing.
Is the file displayed properly?
142
****
World-Wide Web = WWW
The success of WWW
143
****
WWW: growing number of
WWW servers
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
1993 1994 1995 1996 1997 1998 1999 2000
144
****
WWW as popular method to access
information from computers
• The WWW has quickly become the most popular medium
to access information that resides on various computers
that are connected to a computer network.
145
****
Online access information
sources and services
Introduction
146
****
Internet based information sources:
problems / difficulties (Part 1)
• Redundancy and overlap:
On the one hand, there is too much information on some
topics; in other words, the redundancy and overlap are high in
many cases.
Too few information sources:
On the other hand, there are too few information sources on
some topics.
147
****
Internet based information sources:
problems / difficulties (Part 2)
• No order is imposed on most sources.
Quality checks / quality controls are not performed.
Related to this: it is not required to register new information
offered.
Is the information that you find real, honest, authentic?
148
****
Internet based information sources:
how many? how much information?
In 2001:
• More than 10 terabyte (= 10 000 gigabyte) of text data
In 2002:
• More than 2000 million (= 2 billion) unique URLs in the
total Internet
149
***-
Online access information
sources and services
Types of online access information systems
150
****
Types of online access information
systems: “free” versus “fee”
• A lot of the information on the Internet is available free of
charge, but another part is only accessible when a fee is
paid to the producer and / or the distributor.
• Some organisations pay these fees for some sources and
then organise access, so that the members of the
organisation can retrieve and exploit the information as if
it is free of charge.
151
****
Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
Fee-based online information services
(NOT free of charge)
152
****
Types of online access information
systems: “free” for members only
Public access information sources
free of charge
Fee-based online information services,
made accessible “free of charge”
by an institute to its members
Fee-based online information services
(NOT free of charge)
153
****
Online access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW
154
****
Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context,…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.
****Example
Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/
155
****Example
Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/
156
****Example
Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages
157
****Example
Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites
»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/
158
****Example
Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/
159
****Example
Encyclopedias accessible through
Internet and the WWW: examples
• Several encyclopedias and dictionaries have been
integrated and are searchable simultaneously and free of
charge through
http://xrefer.com/
160
****Example
Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm
• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.
161
162
****
Online access information
sources and services
Internet directories and indexes
163
****
Internet: meta-information about
Internet information sources
• in printed manuals and guides:
- it is not always possible to get a copy fast
- it costs money to get a copy
- they are soon out of date
• offered on the WWW!:
+ directly available when we want to use the Internet
+ many systems are accessible free of charge
+ most systems are regularly updated
• (“intelligent agent” software on client PC)
164
****
Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»subject hypertext directories = subject guides
»key word indexes, generated automatically, for searching
»collections of links or forms to the above
»(multi-threaded search systems)
165
****
Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.
• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!
166
****
Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.
167
****
Internet global subject directories:
limitations
• They cover only a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible.
• They are suitable mainly for broad searches that can be
difficult to formulate in words, but NOT for more specific
searches that require combinations of several concepts.
168
****
Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.
****Example
Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.
169
***-Example
Internet global subject directories:
Yahoo! links in pediatrics
• Health > Medicine > Pediatrics:
• International Pediatric Chat - for professionals to share information and education
regarding children's health care.
• National Med/Peds Residents' Association - organization for residents, practioners and
medical students interested in combined internal medicine and pediatrics.
• Neonatology Network - information and communication platform for neonatologists
and pediatricians.
• Pediatria OnLine - qui si parla di bambini, fra pediatri e con le famiglie.
• Pediatric Critical Care
• Pediatric Database (PEDBASE) - containing descriptions of over 500 childhood
illnesses.
• Pediatric Endocrinology Conference - LWPES/ESPE joint meeting occuring July 6-10
2001.
• Pediatric Endoscopic Photos - illustrating intestinal problems in children.
170
***-Example
Internet global subject directories:
Yahoo! for pediatrics
• Health
> Medicine
> Pediatrics:
link to a digital library
(health sciences)
for young patients
171
***-Example
Internet global subject directories:
Yahoo! to pediatrics organisations
• Health
> Medicine
> Pediatrics
> Organizations:
link to the
American Academy
of Pediatrics
172
***-Example
Internet global subject directories:
Yahoo! links to pediatrics schools
• Health > Medicine > Pediatrics
>Schools, Departments, and Programs
• University of Rochester - partnership between pediatric residents and
community-based agencies that serve children and their families.
• Michigan State University@
• Royal College of Paediatrics and Child Health - responsible for training,
examinations, professional standards, and organisation of child health
services for the UK.
• Tohoku University
• University of Alabama at Biringham - programs and training opportunities
in pediatrics. Also contains faculy information and sub-speciatlty
descriptions.
• …
173
***-Example
Internet global subject directories:
searching with a query in Yahoo! (1)
• The directory of Yahoo! can not only be browsed, but can
also be searched with a query.
• However, in this way the hierarchical structure is not well
exploited.
• For the formulation of a search query, Yahoo! can provide
automatic assistance related to spelling and word
variations.
For instance:
After searching for “Capetown”, Yahoo! Answers:
Other Spellings: Try searching for cape town instead.
174
***-Example
Internet global subject directories:
searching with a query in Yahoo! (2)
• When such a query does not provide results, then Yahoo!
uses a much larger external Internet index (not produced
by Yahoo!) to execute a query based on textual search
statements.
The chosen Internet index has varied over time.
• This mechanism is not made very clear and may confuse
the user.
175
***-Example
Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.
176
***-Example
Internet global subject directories:
Google directory
• A hypertext global subject directory can be found at
http://directory.google.com/
• Accessible free of charge.
• Very similar to the Open Directory Project.
177
***-Example
Internet global subject directories:
Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/
• The contents is also used by in the
Google Directory system.
• Accessible free of charge.
178
***-Example
Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.
179
180
****
!? Exercise !? Task !? Problem !?
Try to find Internet sources
which are relevant for you,
by using an Internet-based
global subject directory.
181
***-
Internet global subject directories:
evaluation criteria (Part 1)
• Is usage free of charge?
• Wide coverage?
• Up to date? Frequent updates?
Only few dead / broken links?
• Good coverage of the sources in that part of the world in
which you are interested?
• Does the manager of the directory refuse to give priority
to sites that want to pay to get a prominent place in the
directory?
182
***-
Internet global subject directories:
evaluation criteria (Part 2)
• Easy user interface?
• Short response times?
• Are mirror sites available closer to you for faster
response?
• Good presentation, description of each site?
• Is a rating, appreciation, review offered for each listed
site?
• Is translation of documents offered free of charge?
183
***-
Internet global subject directories:
evaluation criteria (Part 3)
• Good documentation and online help?
• Good help desk available?
• High stability and reliability?
184
***-
Internet global subject directories:
evaluation criteria (Part 4)
• Are other services offered from the same site or with the
same interface?
Is the subject directory integrated with other services?
Additional services can be
»an Internet index or a WWW index or a gateway to such an
index for searching with a query
»travel guides, flight and hotel reservations, maps,...
»WWW-based e-mail and e-mail address directories
»auctions through WWW
185
***-
Internet subject directories:
non-global, more specific systems
a directory limited to
sources in/of a country or region
the
complete
WWW
a global
subject
directory
can lead to
a directory restricted to
a specific subject domain
(“portal”)
***-Examples
Internet subject directories focusing
on a specific subject domain
• Computer science & engineering:
http://www.ub.lu.se/eel/
• Marine science and oceanography:
http://oceanportal.org/
186
187
****
Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.
188
****
Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software
user interface to a search engine
Internet index search engine
Internet information source
Internet crawler and indexing system
database of Internet files, including an index
189
****
Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database
190
***-Example
Internet indexes:
AltaVista
The primary search interface can be found in the US:
http://www.altavista.com/
http://www.av.com/
(These addresses all lead to the same information.)
Mirror site in UK:
http://www.altavista.co.uk/
191
***-Example
Internet indexes:
AltaVista: features
• Allows full text searching of the WWW
• Allows advanced Boolean searching
(in “Advanced” mode)
• Offers relevance ranking of search results
• Offers a link to an Internet subject directory (Looksmart)
• Offers links to systems to find
images, sounds,… (multimedia) in the Internet
192
***-Example
Internet indexes:
Fast = All the Web
• The search interface can be found at:
http://www.alltheweb.com/
• You can search the WWW and ftp servers.
• The database is one of the biggest.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
193
****Example
Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in 2001, 2002.
• For retrieval an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when
»many sites/pages point to it
»“important” sites/pages point to it
194
****Example
Internet indexes:
Google (Part 2)
• Full text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other formats
such as Adobe PDF, Microsoft Word, Microsoft Excel,
Microsoft PowerPoint, Rich Text Format,…
195
***-
!? Question !? Task !? Problem !?
In spite of the popularity of the Google Internet index,
there are limitations in the search features.
Which limitations?
196
***-Example
Internet indexes:
Google limitations
• Google does NOT offer/allow
»manual or automatic stemming,
manual or automatic truncation
»automatic classification of WWW pages
197
****Example
Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»searching for images on the WWW
»searching an archive of Usenet messages +
posting to Usenet groups
• Thus Google has become a great integrator / aggregator.
198
***-
!? Exercise !? Task !? Problem !?
Read the manual
and
make a search with Google.
199
***-Example
Internet indexes:
MSN Web Search
• Offered free of charge by Microsoft.
• You can search for WWW content.
• Since 1998.
• Famous system, because the search interface can be found
with the search functions that have been built into one of
the most widespread Internet browser, Microsoft Internet
Explorer, and because it is offered by
http://search.msn.com/
• Is based on an Internet index created by another
company.
200
***-Example
Internet indexes:
Scirus
• Allows you to search for manually selected scientific
information (only) on the WWW, including access
controlled sites, such as the peer-reviewed articles in the
journals that are published in ScienceDirect by Elsevier.
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system that is
also used by Alltheweb.
• The search interface: http://www.scirus.com
201
***-Example
Internet indexes:
Scirus features
• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF, PostScript and other formats.
202
****
Internet indexes:
coverage / size of each index
The indexes grow and their “size ranking” is variable.
Biggest systems in 2002:
•
Google !
•
AltaVista
•
(Fast =) All the Web (serving also Lycos)
•
Systems based on the INKTOMI database of WWW
pages, such as Hotbot, MSN Web search,…
203
****
!? Exercise !? Task !? Problem !?
Try to find Internet sources
which are relevant for you,
by using an Internet index.
204
***-
Internet indexes:
variations among various systems
• Besides their common aims and characteristics, we can
nevertheless see differences, variations among the
searchable Internet index systems.
• To illustrate these variations and to assist Internet users
to make a decision on which search system to use, the
following list of some features and evaluation criteria can
be useful.
205
***-
Internet indexes:
evaluation criteria (Part 1)
• Is usage free of charge?
• How complete is the coverage?
• Is the coverage good (or poor) for a particular geographic
region?
• Is the coverage good (or poor) for a particular type of
documents?
• Is the searchable database up to date? Is the database
updated frequently? Do the search results contain only
few dead (broken) links?
206
***-
Internet indexes:
evaluation criteria (Part 2)
• Is spamming filtered out, to give other pages a better
chance of turning up in the result set?
Can the system cluster presumed duplicate documents in
the results?
Or does the system simply eliminate presumed duplicate
documents from its database?
• Does the database system work with a full text indexing of
each ASCII and HTML document that has a place in the
database, so that full text searching is possible?
207
***-
Internet indexes:
evaluation criteria (Part 3)
• Are the contents of meta-fields also indexed to make them
searchable?
• Does the system index also the text in files on the web that
consist of non-ASCII codes to make these also searchable
and retrievable? For instance files in the format of the
various versions of
»Microsoft Word
»Microsoft PowerPoint
»Adobe Acrobat (Portable Document Format)
208
***-
Internet indexes:
evaluation criteria (Part 4)
• Field indexing, so that searching for the contents of a
particular field is possible?
for instance:
the HTML title,
HTML keywords,
URL,
date,
link,
Java applet,
text,
image file,
sound file,
video file,...
209
***-
Internet indexes:
evaluation criteria (Part 5)
• Does the system offer powerful search options like
»truncation?
»word stemming?
»Boolean search combinations?
»proximity searching?
»automatic translation of your search terms in several other
languages?
»spelling check of your search terms?
210
***-
Internet indexes:
evaluation criteria (Part 6)
• Can the results be limited to a certain time period?
For instance based on the date
»of the file as noted by the server computer, or
»of the most recent indexing of the file
• Is the user interface easy to understand and efficient to
use?
• Is a user interface offered in your own language?
• Does the system rank the items in the result set according
to their presumed relevance?
211
***-
Internet indexes:
evaluation criteria (Part 7)
• Possibility to combine Boolean retrieval with relevance
ranking of results?
• Can the results be ordered according to date
»of the file as noted by the server computer, or
»of the most recent indexing of the file
• Can the results be ordered according to size?
• Can all the results (documents) from the same site be
grouped together (clustered)?
212
***-
Internet indexes:
evaluation criteria (Part 8)
• Can the system rank the results (documents) on the basis
of the number of WWW hyperlinks to that document?
• The system does not place/rank some results (documents)
higher in the results list, on the basis of payments by the
producer of those documents to the search system
company.
• Are advertisements / sponsored links / sponsored results
clearly distinguished from normal (not sponsored) search
results?
213
***-
Internet indexes:
evaluation criteria (Part 9)
• Short response times?
• Are mirror sites available closer to you for faster
response?
• Does the system offer a good presentation format of each
result (document/page/item)?
For instance: are search terms indicated / highlighted in
the results?
• Good and detailed summary of each result available?
• Offers an analysis of words occurring in the results,
which can help you to refine a search?
214
***-
Internet indexes:
evaluation criteria (Part 10)
• Is translation of documents offered free of charge?
• High stability and reliability?
No large variations/fluctuations in the results from
identical searches at different times.
• Good documentation and online help?
• Good help desk available?
• Can the search system provide updated results through
electronic mail, as a current awareness tool?
215
***-
Internet indexes:
evaluation criteria (Part 11)
• Other services available besides the normal WWW index:
»index to news resources, that is more frequently updated?
»anonymous ftp file index?
»gopher index?
»searchable Usenet newsgroups archive?
»Internet subject directory?
»White pages = people finder = addresses = ...
»WWW-based e-mail and e-mail address directories
»auctions through WWW
216
***-
Internet indexes:
evaluation criteria (Part 12)
• Is the search/query also submitted to another database to
obtain more results?
for instance:
to a book database to obtain book descriptions
besides WWW documents
217
***-
Internet indexes:
evaluation criteria (Part 13)
• Are results (retrieved documents) grouped / classified /
clustered by the search system, on the basis of the subjects
of the documents and are these presented as groups /
clusters / classes to the user of the search system, to assist
the user in coping with the problems that can be caused
for instance by multiple meanings of words used in a
search query.
218
***-
!? Question !? Task !? Problem !?
Why do different Internet search engines
(in most cases)
give different results for an identical search?
219
****
Coverage of Internet directories and
Internet indexes
Internet information sources
A global Internet directory
A global Internet index
220
****
Global Internet search tools:
a comparison
Global Internet
directories
Global Internet
indexes
Multi-threaded
search systems
• Only a limited
selection of Internet
sources
• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes
• Browsing
information sources
is easy
• Searching requires
some skills and
knowledge
• Searching requires
some skills and
knowledge
• Good for broad
searches
• Good for specific,
narrow searches
• Good when even 1
index does not yield
information
221
***-
!? Question !? Task !? Problem !?
Which information on the Internet
is not covered
by many searchable Internet indexes?
222
***-
Internet indexes cover only a part of
the Internet: introduction (1)
The “visible” part of Internet
The “hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index
like, AltaVista, Google...)
223
***-
Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.
224
***-
Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...
Internet
WWW
CGI, ASP,...
Static indexable texts in the WWW
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news
Databases
and
file archives
accessible through
the Internet
Word
files
Information accessible only
when passwords are used
PDF
files
225
***-
Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet
»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups
226
****
Finding multimedia files on the
Internet
Several public access search systems are available
free of charge
to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)
»sound / audio files (music, speeches,...)
»video
227
****
Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).
****Examples
Finding images on the Internet:
examples of search engines
• http://alltheweb.com !!!
• http://gallery.yahoo.com/ !
• http://images.google.com/ !!!!
or through http://www.google.com/
• http://multimedia.lycos.com/
• http://www.altavista.com/ !!
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)
• http://www.ditto.com/ !
228
**** Examples
Finding images on the Internet:
screen shot of a Google image search
229
230
***-
!? Exercise !? Task !? Problem !?
Use a specialised search engine
to find images
about a particular subject
on the Internet.
231
****
Online access information
sources and services
Public access book databases
232
****
Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.
• The contents of most books is (still) not available on the
Internet.
• Most Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.
233
****
Public access book databases:
an overview
• (Databases by publishers.)
• Databases by book distributors / bookshops!
• Online public access library catalogues
• (Databases of computer-based versions of books.)
234
****
Public access book databases provided
by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
****Examples
Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
• Barnes and Noble (US):
http://www.bn.com/
235
***-Examples
Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/
• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/
236
***-Examples
Book databases accessible free of
charge: for old books
To find used, secondhand, rare, hard-to-find and out-ofprint books around the world:
• abebooks
http://www.abebooks.com/
• Virtual Book Shop
http://www.bookshop.com/
237
238
****
Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/
***-Examples
Example of an international
public access dissertation database
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.
239
240
****
!? Question !? Task !? Problem !?
Search for titles of books
which are relevant for you,
using an online database provided by
a book publisher or bookshop.
241
***-
Public access book databases:
evaluation criteria
(Part 1)
• Is usage free of charge?
• Wide coverage?
Also for books in your preferred language?
• Specialized coverage for particular subjects?
• Up to date? Frequent updates?
• Abstracts, summaries, descriptions, tables of contents
included?
• Full text indexing of each item in the database,
so that full text searching is possible?
242
***-
Public access book databases:
evaluation criteria
(Part 2)
• Field indexing, so that searching for the contents of a
particular field is possible? for instance
»the title
»the date of publication
»the author
»the publisher
»the language
243
***-
Public access book databases:
evaluation criteria
(Part 3)
• Does the database producer improve retrieval by
»adding subject terms, or
»by classifying the books in categories
• Powerful search options:
»truncation? stemming?
»Boolean search combinations? proximity searching,…?
»spelling check of your search terms?
»translation of your search terms
in several other languages?
244
***-
Public access book databases:
evaluation criteria
(Part 4)
• Easy user interface?
• Is a user interface offered in your own language?
• Relevance ranking of results?
• Possibility to combine Boolean retrieval with relevance
ranking of results?
• Can results be limited to a certain time period?
• Can the results be ordered according to
date, size, origin,...?
245
***-
Public access book databases:
evaluation criteria
(Part 5)
• Good presentation of each result?
• Does the system offer a current awareness service,
sending information on new titles that may be of interest
to you?
• Short response times?
246
***-
Public access book databases:
evaluation criteria
(Part 6)
• Are other services offered from the same site or with the
same interface?
Is the system integrated with other services?
Additional services can be
»searchable databases of videos, of music CD’s, CD-ROMs,
DVDs, all for sale also
»a subject directory for browsing, besides the database with
index for searching
»WWW-based e-mail and e-mail address directories
»auctions through WWW
247
****
Online access information
sources and services
Library
Online Public Access Catalogues
= OPACs
248
****
Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.
249
***-
Online Public Access Catalogues
= OPACs: definition
Online Public Access Catalogue:
a term used to describe any type of computerized library
catalog offered to the public by online login
***-Example
Online access library catalogues:
The British Library
• Accessible online via WWW:
Since 2000: http://blpc.bl.uk/
• Access free of charge
250
***-Example
Online access library catalogues:
The British Library: screenshot
251
252
****
Online access information
sources and services
Fee-based online public access
information services
253
****
Types of online access information
systems: “free” versus “fee”
• A lot of the information on the Internet is available free of
charge, but another part is only accessible when a fee is
paid to the producer and / or the distributor.
• Some organisations pay these fees for some sources and
then organise access, so that the members of the
organisation can retrieve and exploit the information as if
it is free of charge.
• The first commercial computer systems that make
information available online were born around 1975.
• Most of them are now also available through the Internet.
254
***-Examples
Fee-based online access services:
examples (Part 1)
Name
Location of the computer(s)
America On Line
OCLC
Ovid Technologies
CompuServe
Cambridge
Data-Star
Dialog
EBSCO
U.S.A.
U.S.A.
U.S.A.
U.S.A.
U.S.A., Taiwan, UK
Switzerland
U.S.A.
U.S.A.
255
***-Examples
Fee-based online access services:
examples (Part 2)
Name
Location of the computer(s)
Elsevier ScienceDirect
Factiva
ISI (Web of Science, JCR,…)
LexisNexis
MSN (Microsoft)
Prodigy
Silver Platter
STN
Swets (e-journals)
...
U.S.A.
U.S.A.
U.S.A.
U.S.A., The Netherlands,...
Germany - U.S.A. - Japan
The Netherlands
...
256
***-
Online information services:
various names for similar systems
• (fee-based) online (access) information service
• (fee-based) online (access) computer service
• databank
• database vendor
• host computer
• aggregator
• ...
257
****
Online information services:
total size of their databases
In 1999:
The big host systems and the public access WWW pages
offer a comparable quantity of information:
• WWW offered about 8 terabytes (= 8 000 gigabytes) of
text data
(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)
• Dialog offered about 9 terabytes (= 9 000 gigabytes)
(in 1998)
»6 billion pages of text
»3 million images
258
***-
Database hosts / distributors:
evaluation criteria
(Part 1)
• Contract required?
• A priori payment required?
• Stability / history / evolution / future of host?
• Low costs of data communication?
• Many databases available?
• Whole records available (or only parts)?
• Frequent updates?
• Whole database available? As one file or fragmented?
259
***-
Database hosts / distributors:
evaluation criteria
(Part 2)
• Price of access? Price of information?
• Powerful search options: truncation, Boolean
combinations, proximity searching,…?
• Can the indexes of more than one database be searched
simultaneously?
• Speed of retrieval?
• Relevance ranking of results?
• Fast response? Accuracy of data communication?
• Clear output format?
260
***-
Database hosts / distributors:
evaluation criteria
(Part 3)
• Online indication of costs?
• Easy user interface?
• Practice free of charge?
• Good manuals, documentation and online help?
• Training courses available? Quality?
• Good help desk available?
• Gateway service offered?
• ...
261
***-
Databases of
online public access databases
• Example
»Gale directory of databases !
• Their coverage:
»online access databases
»(databases accessible on CD-ROM)
»...
262
***-
Databases of databases:
Gale
• Produced in U.S.A.
• Not free of charge
• Available in various formats:
»printed
»on CD-ROM
»online via the host systems Data-Star, Dialog,
with a payment required for each use
»online through the Internet through various hosts,
for a fixed price per year to be paid in advance
263
****
Online access information
sources and services
Online access databases about journal articles
264
****
Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications. (for instance Emerald, Elsevier)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.
***-Example
Online access databases
about journal articles: Ingenta (1)
• Ingenta Journals allows you to search a bibliographic
database of millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.
265
***-Example
Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Ingenta has acquired Uncover in 2000.
• Available from
»http://www.ingenta.co.uk/
»http://www.ingenta.com/
266
****Example
Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text
(Journal articles, Journal issues, Books, Reports or
Conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Searching is free of charge.
• Available from
http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.
267
***-Example
Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.
• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.
268
269
***-
!? Question !? Task !? Problem !?
Search for titles of journal articles
which are relevant for you,
in a database provided free of charge.
270
***-
Online access information
sources and services
Electronic newsletters and journals
271
***-
Electronic newsletters and journals:
introduction
• Since the end of the 1990s, electronic journals have
become a new communication medium that cannot be
neglected.
Author / Sender
Editor
Reader / Receiver
272
***-
Online access information
sources and services
Conclusion
273
****
Online access information:
future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge.
• The quality of server and client software is growing.
A consequence is:
• An increasing number of end-users searching for
information online.
274
****
Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.
• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.